Welcome 



AUSTRALIAN OPEN SYSTEMS USERS GROUP 


On behalf of AUUG Inc., welcome to the Perth AUUG Summer Technical Conference 
1993. 


Despite our relative isolation in Perth, today we will hear several world-class speakers, 
who have or will shortly present their papers at international conferences We will also 
hear some local speakers speaking for their first time at this conference. I hope that 
this mix of experienced and new speakers will continue at future conferences 

This is the fourth consecutive AUUG Summer Technical Conference to be held in 
Perth. The first was organised by Glenn Huxtable of UWA, who is the national 
organiser of AUUG's Summer Conference programme The second and third were 
organised by Alan Main of Functional Software. The success of this conference builds 
upon both the work of these gentlemen in the past and the assistance that they have 
provided to me this year - my personal thanks to them 

Several new innovations have been introduced for this conference: 

• The provision of two high-calibre interstate speakers; 

• The running of a low-cost tutorial programme in conjunction with the conference; 

• The production of a set of proceedings for the conference; 

• The hosting of an informal cocktail event at the close of the conference 

I would like to make a special mention of the fact that the last two of these new 
features have been made possible by the generous sponsorship of Silicon Graphics and 
Sun Microsystems , respectively Without this sponsorship it would have been 
impossible to provide these extra benefits without increasing conference fees 

You may have noticed some conference evaluation forms at each table I hope you will 
take a few moments to fill one of these out at the close of the conference Your 
comments and suggestions will be used to make the Perth AUUG Summer Technical 
Conference 1994 even more successful 


Adrian Booth 

Perth AUUG Summer Conference 1993 Organiser 

AUUG INC SECRETARIAT 
PO BOX 366 KENSINGTON 
NSW 2033 70 GLENMORE ROAD 

PADDINGTON NSW 2021 AUSTRAL A 
INC IN VICTORIA ACN A0016636N 
PHONE +61 2 361 5994 PAX +61 2 332 4066 






Summer Technical Conference '93 
Perth, UJestern Australia 


CONFERENCE PROGRAMME 
Friday, 16 April 1993, 
Orchard Hotel 


Late registration will be accepted until 9:00am 



UNIX® AND OPEN SYSTEMS USERS 


8:30am Registration - Coffee and Tea available 
9:00am Welcome 


9:10am Report on USENIX 1993 Winter Conference 

Greg Rose, Australian Computing and Communications Institute 

9:30am The New Security Paradigm 

Chris Schoettle, UNIX System Laboratories, Australia & New Zealand 

10:00am On-Line Transaction Processing Comes to UNIX® 

Chris Schoettle, UNIX System Laboratories, Australia & New Zealand 

10:30am Morning Tea 

10:45am UNIX Network Backup and Archival Strategies 
Paul Templeman, Sequel Technology 

11:30am Distributed Object Management 

Harald Reiss, Reiss Dynamics Enterprise 

12:15pm Lunch 

1.00pm Homebrew Network Monitoring: a Prelude to Network Management 
Mike Schulze, Curtin University of Technology 

1:30pm How Does My Code Know When it is Running? 

Dr. Chris McDonald, University of Western Australia 

2:30pm Afternoon Tea 


2:45pm A History of UNIX 

Greg Rose, Australian Computing and Communications Institute 

AUUG INC. SECRETARIAT 
PO BOX 366 KENSINGTON 
NSW 2 0 3 3 7 0 GLENMORE ROAD 

PADDINGTON NSW 20 21 AUSTRALIA 
INC. IN VICTORIA ACN A0016636N 
PHONE +61 2 361 5994 FAX +61 2 332 4066 


4:15pm 


Close Conference 












- 1 - 


1. Report on Usenix, San Diego 25th-29th January 1993. 

Greg Rose, Australian Computing and Communications Institute 


I arrived in San Diego from New York on Saturday. The main reasons for me to arrive 
early were firstly, because the Usenix Board of Directors meeting was scheduled for 
Sunday, and secondly, given the choice between New York or San Diego in winter, 
well... 


2. Sunday... Board Meeting. 


Attendance at Usenix’s "big" conference is declining, and this is seen to be a worrying 
thing, but it is most likely to be because the associated smaller conferences are doing 
extremely well. For example, the Large Installation System Administration conference, 
and the C++ conference, are both about half the size of the main conference. The 
program for this particular conference is seen to be extremely strong, especially with a 
number of great talks from Australia. 

Elizabeth Zwicky reported about SAGE (the System Administrators’ Guild), Local 
Technical Groups (Regional groups). Special Technical Groups (SIGs), and 
International Affiliate Groups (eg. Sage-AU). More about Sage later; for now suffice it 
to say that Usenix is happy that Sage-AU is up and running, and wants to cooperate. 

Usenix had planned an Application Development Symposium, but that was cancellled. 
It was supposed to be a joint venture with UniForum Canada, and the logistics got too 
hard, but there was a reasonable amount of interest. "Symposium" was probably too 
high-falootin a word for this thing. This is something I think AUUG could do well. 

Finally, after a number of years of being shunned, a change of management at 
UniForum is now amenable to getting friendly with Usenix again. Initially, this 
cooperation will probably take the forms of jointly funding the POSJX standards 
watchdog function, and hosting a major Usenix event (possibly the next LISA) at the 
same time and place as the UniForum in San Francisco, in early 1994. 

Personally, I think these previous points show that AUUG was right to try and avoid 
splitting into technical versus commercial subgroups. UniForum has bet their farm on 
Unix, and really are not in a position to move towards other "open" systems. Usenix 
on the other hand have diversified quite successfully, but rely probably too much on 
the moral and technical high ground. Now they seem to need each other again. (End 
personal opinion.) 

Mick Farmer reported on the new structure of EurOpen. Like most things in Europe, 
this group (formerly the European Unix Users Group) is undergoing rapid and 
disruptive change. EurOpen was structured as an umbrella organisation, but also 



- 2 - 


undertook to organise their own conferences to some extent. The Secretariat was 
moved from Owles Hall in England to Brussels, Belgium, and a buch of employees 
were terminated at the same time. A couple of member countries refused to pay their 
dues, and EurOpen was looking at bankruptcy this year. So, a quick restructuring 
resulted in "EurOpen Lite" which seems to merely coordinate and distribute 
information between the member country’s groups (and no longer competes with 
them). It is too soon to say whether this will succeed or not, but it is a valiant effort to 
save the otherwise doomed organisation. AUUG seems better all the time... 

Usenix’s Board sere all surprised to hear that 32% of their members were from 
overseas (which doesn’t include Canada). Peter Collinson (U.K.) and I, who regularly 
attend these meetings, had a bit of a giggle about this, as we have certainly known it 
for some time. 

When the board meeting moved into closed session (i.e. they threw out the hangers-on 
like me) I went around to the registration area and picked up my books etc. It was 
within the first few minutes that I realised what the really hot topic of this Usenix 
conference would be. 

Background: Unix System Laboratories is sueing the University of California at 
Berkeley, and Berkeley Software Design, Inc., over some sort of alledged infringement 
of licenses or copyrights or look and feel, or something equally intangible. In the latest 
round of this war, USL required a list of all people who have ever worked for a Unix 
Source code licensee, or who have ever seen source code of Unix or of the Berkeley 
NET-2 (re-written) release, or who have read anything about internal design or data 
structures of Unix (ever read the Bach Book, or the Lions Commentary?). The reason 
for this request is that USL asserts that any code such a person writes for an operating 
system, or in which the same algorithms are used (e.g. linear search, used extensively 
in Unix) is really the property of USL. Their deposition used the words "mentally 
contaminated" to describe such people. The court is currently deciding the status of 
this suit. (End of Background.) 

Rick Adams, who works for BSDI and UUNET, has made a bunch of badges which 
say "MENTALLY CONTAMINATED" in large red letters, and I happnend to have a 
pocket full that I was intending to return to Australia. Virtually everyone who saw the 
badge said that they were contaminated, and so wanted such a badge. I ran out within 
minutes. (Don’t worry, I got more.) Peter Salus was wearing the UKUUG windcheater 
with the famous "/* you are not expected to understand this */" comment, and pointed 
out that anbody who looked at him was also forever contaminated. (In fact, if you just 
read that, you are now contaminated too... fortunately most of this audience is in 
Australia and we appear to be sensible enough that we can ignore this stupidity. The 
judge here (the U.S.) is expected to rule on this point after about two weeks of 
deliberations, and no one can guess which way it will go.) 

The "launch" of the week was held at 18:00 and featured non-alcoholic drinks free, a 


- 3 - 


cash bar, and some quite good (and sometimes very spicy) mexican munchies. 

The conference T-shirt features a standard sort of San Diego advertising picture, with a 
caricature of Rob Kolstad (Conference Convener) surfing on an oversized keyboard. 
Tomorrow night there is a group getting together to sew pink tutus around him (sorry, 
an "in” joke - it was proposed on the network that there should be a competition 
about Rob and tutus). 


3. Monday... Tutorial day 1. 


Not much to say about this day. There are about ten full day tutorials each day, and I 
attended the one about OSF DCE. The standard of the tutorials is pretty high, even if 
the information is something you would rather not hear. DCE, despite its total lack of 
real availability at this point in time, is already an important standard. Fortunately I 
was already familiar with all seventeen ol the possible synchronisation primitives 
which are all supported by DCE, so I floated around meeting people for the morning 
and only sat in on the DCE tutorial for the last half. Dinner was a rather nice mexican 
meal from somewhere north of UCSD. 


4. Tuesday... Tutorial day 2. 


Relatively little to report on this day, as I decided that I needed to fill in the gaps in 
my knowledge of Kerberos, and the tutorial by Dan Geer and Jon Rochfis was 
excellent. Tuesday evening was the board meeting for USENIX/SAGE which I 
attended representing Sage/AU. 

By the end of this day, my conference badge was getting rather heavy. As well as the 
two ribbons I deserved (Invited Speaker and Conference Pilot) I had added a 
Newcomer , as it was such a pretty yellow. I had a dinosaur sticker, the 
'MENTALLY CONTAMINATED" warning mentioned above, and a similar pin in the 
form of a New Hampshire number plate, saying "NET2", and with an insertion mark 
and AT&T inserted in the motto, vis "Live (AT&T) Free or Die". At the end of each 
ribbon was a button. One was the "Robbie Kolstad fan club", another a Henry Spencer 
badge quoting Marshall Rose "OSI Committees lack Adult Supervision", and yet 
another lawsuit badge, "NET2" circled by the words "Want To Be Sued? Ask AT&T 
How!". This badge was getting pretty heavy. The solution to the problem appears on 
the next day... 


5 . Wednesday... Conference Day 1. 


The keynote talk was "Pen Based Computing and its Impact", by Robert Carr of GO 



- 4 - 


Corporation, writers of the PenPoint Operating System. This talk was quite interesting, 
until the last few minutes, but I’m not sure that it deserved the keynote position. I can 
believe that the interace to the computer through a pen might be more productive for 
many people, particularly people moving around a lot. The last five minutes of the 
talk, however, were mind blowing. All the features are usable in Japanese, including 
the handwriting recognition. Watching a video of japanese characters being written and 
then redrawn accurately after being recognised was impressive. 

There were two minor problems with the talk. The speaker claimed that PenPoint was 
the first operating system to support Unicode, at which Dave Presotto and Phil 
Winterbottom, the implementers of Unicode in Plan-9, sort of choked, and they called 
the Hobbit chip from AT&T a "brand new high performance RISC microprocessor", 
while it is in reality the same old CRISP chip revisited. 


Immediately afterward, Rob Pike gave a talk about use of Unicode in Plan-9. I can’t 
reproduce the title here, as it included japanese and Hebrew versions of "Hello world". 

Many of the papers on this day were not of great interest to me, and because of the 
weight of my badge, 1 decided to start a new business (profit free) producing funny 
name badges. Because Rob Kolstad is the current scapegoat, name badges like Greg 
Kolstad, Ken and Dennis Kolstad, Kirk Kolstad, and even Rob (Pike) Kolstad 
appeared. 

Dinner was an excellent psuedo-italian meal while Addison Wesley negotiated a 
contract with me for my new book (that’s a plug). 

This evening, there was an Open Board Meeting for Sage, in which it was announced 
that there was an intent to affiliate between Sage/US and Sage/AU. The committee for 
Sage/US is Steve Simmons (President), Pat Parsegian (Secretary), Peg Schafer 
(Treasurer), Carol Kubicki, Pat Wilson, and Paul Moriarty. There was an observation 
that there was an enormous momentum behind the formation of Sage, that the first 
year was extremely successful, and that there was a great level of excitement about the 
future of Sage. 

Sage has a number of working groups and discussion groups, which are displaying 
varying degrees of success in achieving their goals, or even specifying their goals. 
There is now a sub-committee which will review these working groups. 

What do you get for your membership in Sage? An investment in the future, and in 
the working groups. A chance to be heard as a coherent group, without being drowned 
out by the other voices. Gradually, Sage will take over the Lisa conferences. 


6. Thursday... Conference Day 2. 


- 5 - 


There were a number of interesting papers given, including a talk from Plan 9 about 
the I/O system. Some people from Bell Labs had been, shall we say aggressive, to 
some of the earlier presenters, so they came in for some bashing of their own from 
Peter Honeyman and me. 

Dan Klein gave an excellent invited talk about languages for specifying specialised 
things, and the major example was the specification language for Blazons (medaeval 
shield designs), with parallels to PostScript. 

Immediately after lunch there was a highly acclaimed talk about "A History of Unix", 
and despite lots of people in the audience who had been closer to the events than the 
humble author, there were few disagreements over facts. 

The conference reception was different to previous Usenix events, in fact it was much 
more AUUG-like, with tables and entertainment. Unfortunately, the impromptu 
comedy foursome did a pretty poor job of making jokes about relevent issues, and 
even got Ken Thompson’s name wrong (Ritchie Thomas!) in one of the jokes. The 
jokes that weren’t Usenix specific were pretty smutty and sexist, and didn’t go over 
well. I think it will be the last time Usenix tries that idea. 

After the reception, and with the pressure of my talk over, I proceeded to get (more) 
drunk, culminating in the Single Malt Scotch Working Group meeting, which went till 
well after midnight. That’s why the typing is not terribly coherent. 

This year, Usenix has introduced an annual "Keepers of the Flame” award, and the 
first one was presented to the Computer Science Research Group at the University of 
California at Berkeley. The seven members of the CSRG get rather nice glass 
sculptures, and corporate contributors get plaques. Then there was a huge list of major 
contributors who will get calligraphed certificates of appreciation, and Robert Elz was 
deservedly on this list. Then there was a list of about 160 other significant 
contributors, with a number of Australians on it. This is a nice idea. 


7. Friday... Conference Day 3. 


Andrew McRae opened the day, and his talk was well received by those who attended. 
I was in the parallel invited talks track, where some of the best papers from the 
filesystems workshop were represented. Peter Honeyman from the University of 
Michigan was the host and presented one of the papers, about Alex, the NFS FTP 
filesystem. 

On one of the previous days, Margo Selzer presented a joint paper about the results of 



- 6 - 


the final implementation of the Log Structured File System. This has been one of the 
great hopes on the horizon for improving the performance of file systems, but this 
paper basically canned it, pointing out that the overheads of garbage collection 
eventually soaked up most of the benefits for common scenarios. 

As an obvious corollary to the promising initial results, this same conference had a 
number of papers about novel applications of the log-structured filesystems, which 
were no longer such wonderful things to do... 

At lunch time 1 got back to work and attended a meeting of Usenix’s tutorial program 
committee, to get ideas for tutorials for the coming AUUG conference. (I am tut 
coordinator for AUUG’93.) I have some good ideas, if people want to present a 
tutorial but don’t know what to talk about. 

As I mentioned earlier, there had been some amount of on-going Rob Kolstad bashing, 
particularly with reference to pink tutus. At the beginning of the after lunch Invited 
Talks session, which was a representation of the best papers from the last Lisa 
conference, this was finally laid to rest (perhaps). Steve Simmons began by introducing 
Rob (whose badge by now said simply "Rob *") and explaining the origin of the 
whole tutu business. He then said that Rob would be presented with a pink tutu which 
he could ceremoniously burn, and end the matter. What he didn’t mention was that the 
tutu in question was being worn by Ed Gould (190 cm tall, 140 kg wide, and hairier 
than me). This raised quite a laugh. 

At the close of the conference two equal awards were presented, for the best 
presentation. One award went to Margo Seltzer, Keith Bostic, Kirk McKusick, and 
Carl Staelin for "An Implementation of a Log Structured File System for UNIX", 
despite the depressingly negative result. The second went to Stephen Uhler for 
"PhoneStation, Moving the Telephone onto the Virtual Desktop". 

After the conference was over I went to dinner with some excessively rowdy people 
from Bell Labs, and then to a party. This was a hard week for me. No, really, I didn’t 
leave the hotel grounds except for two dinners. 1 only got to the hot tub once... 


8. Overview of Content. 

Given three parallel tracks, and the fact that there was never a clear distinction in the 
type of the content, obviously it was impossible for me to see all the papers. I’ll 
mention here those that I thought were worthy of specific comment. I will suggest to 
the newsletter editor that a full table of contents be printed, and you can purchase 
copies of the proceedings through AUUG. 

I’ll repeat, for the record, the observation that it was an extremely good technical 
program. 


- 7 - 


"Pen based Computing and its Impact", Robert Carr. 

See discussion above. 

"Hello World" (even Usenix proceedings shortened the title), Rob Pike. 

Describes the use of a variant of Unicode as the base character set 
for the operating system, and some of the issues that arose during 
the implementation. 

"DUEL - a Very High Level Debugging Language", by Michael Golan and David R 
Hansen. 

The authors present a small language which is oriented towards 
evaluating expressions about high level data structures. It is 
interfaced to a common debugger (dbx?) and finds its chief use for 
writing things like "show me the elements that are out of order in 
this array”. I find the language interesting and useful, but the 
application inappropriate. 

"PhoneStation, Moving the Telephone onto the Virtual Desktop", by Stephen Uhler. 

You had to be there, but reading the paper is also very rewarding, 
particularly if you have Sun SparcStations. 

"Jgraph - a Filter for Plotting Graphs in Postscript", James Plank. 

I only caught the last part of the talk, but this looked like useful 
stuff well implemented. 

"Wafe - An X Tooklit Based Frontend for Application Programs in Various 
Programming Languages", Gustaf Neumann & Stefan Nusser. 

The X application builder wars are being fought between Tk and Wafe. 

Stuff about Tk appeared a couple of Usenixes ago, and if you do any 
X application programming you need to know about both of these, 
although you will probably end up settling on one. 

"The Design and Implementation of the Inversion File System", Michael Olsen. 

Inversion is a file system, accessible through NFS, that is 
implemented within the PostGres database (hence the name), and has 
all sons of interesting properties regarding uncorruptability, 
consistency and recovery, not to mention some interesting semantic 
properties (Query-language constructs in pathnames!). 



- 8 - 


"File Systems in User Space", Paul Eggart and D Stott Parker. 

How (not) to hack useful but irregular and unpredictable semantics 
into your file system, by totally subverting shared libraries. Read 
it, the idea is very interesting, but the design... 

"The Organisation of Networks in Plan 9", Dave Presotto and Phil Winterbottom. 

Worth reading. Watch for things by Phil Winterbottom. 

"An Implementation of a Log Structured File System for UNIX", Margo Seltzer, Keith 
Bostic, Kirk McKusick, and Carl Staelin. 

As usual, a right scholarly piece of work that analyses the 
performance of the much touted Log Structured File System in the 
real world. Unfortunately, it seems perhaps to have been a dead end. 

There was a paper a year or so ago that used a different method to 
obtain faster write performance, that perhaps now need more 
examination. 

"The Nachos Instructional Operating System", Wayne A Christopher, Steven J Procter 
and Thomas E Anderson. 

I didn’t see the presentation, but the paper won the best paper 
award. 

There were lots of other papers that I hope I am not slighting by leaving them out of 
this list. There seemed to be something for everyone at this Usenix. 


The New Security Paradigm 


Chris Schoettle 
UNIX System Laboratories 
Australia and New Zealand 
Tel: +61-2-906-8953 
Fax: +61-2-436-4673 
Email: cts@ usl.com 


ABSTRACT 


There is a new security paradigm to secure the corporate computing 
environment in the 1990’s. The existing security paradigm utilizing physical 
security and add-on security packages is no longer sufficient. A new paradigm is 
required in which security is embodied in every object in the computing 
environment. The hardware and operating system software must be secured to 
provide a suitable foundation for secure applications. Applications must be 
developed in a secure manner and a security policy must be defined and 
enforced by management controls. 

The security model is analogous to the layering of an onion. Security is first 
achieved at the lowest level object or inner-most layer. Once this foundation is 
provided, each successive object, or layer, built above is secured until all objects 
are secure. Utilizing the mechanisms that follow the new security paradigm will 
result in a secure enterprise computing environment. 

UNIX® System V Release 4.1 Enhanced Security provides security in the 
operating system that satisfies the new security paradigm, including Mandatory 
and Discretionary Access Control, Identification and Authentication, Trusted 
Facility Management, Trusted Path, Trusted Import/Export, Networking 
Security, and Auditing. 



BIOGRAPHY 


CHRIS SCHOETTLE 


Mr. Chris Schoettle is General Manager of UNIX System Laboratories (USL) in 
Australia and New Zealand. Mr. Schoettle has extensive international 
experience in the UNIX® system market, encompassing technical, business, sales 
and marketing arenas. 

Prior to this assignment, Mr. Schoettle was the Technical Services Manager at 
UNIX System Laboratories Europe, based in London, UK. Additionally with 
USL Europe, he has held technical and sales positions, working closely with the 
worldwide Open Systems industry. He has also held positions in technical 
consultancy and services with NCR Corporation, at their UNIX systems 
development and manufacturing site in Columbia, South Carolina, USA. 

Mr. Schoettle has Bachelor and Master’s degrees in Computer Science and 
Mathematics from Emory University in Atlanta, Georgia, USA. He has 
furthered his post-graduate education at the University of St. Andrews in 
Scotland and at the University of South Carolina in the USA. 


Introduction 


The market for secure computer systems continues to be a fast paced and 
rapidly growing section of the industry. Specifications for secure systems 
come from all facets of the industry, governments both domestic and 
international, commercial enterprises, and standards organizations. 

This paper describes the role that UNIX® System V Release 4.1 Enhanced 
Security (SVR4.1ES) plays in this marketplace. The main focus is on the 
feature changes made to UNIX SVR4.1ES to satisfy the Trusted Computer 
Security Evaluation Criteria (TCSEC) [1] while still maintaining the look 
and feel of traditional UNIX systems. A brief discussion of how SVR4.1ES 
meets major security specifications is also included. 

Computer security has existed in some form or another for as long as 
sensitive, proprietary data has resided on computers. Before open systems, the 
majority of computers in the work place were stand-alone, mainframes 
running some version of the vendor's proprietary operating system. The 
security typically consisted of restricting system access by requiring some 
form of authentication, usually in the form of a user ID and perhaps an 
associated password. On some systems, data set access was also restricted by 
some form of authentication. Access to any system resource required a series 
of job control language (JCL) statements. The JCL itself was complex and 
varied between different systems; thus it served as a form of access restriction 
(since the naive user was unable to access data sets or system services). In this 
type of environment, the security afforded by identification and 
authentication was sufficient. Because these computers were running 
proprietary operating systems, the source code was unpublished and not well 
known. This lack of knowledge, litde or no networking, and limited 
interoperability among the various proprietary operating systems made it 
difficult for rogue software to penetrate the operating system and spread to 
other systems. 

The emergence of the UNIX system and open systems changed the way in 
which computers were used and configured. The UNIX systems tended to be 
small systems configured in some kind of networked cluster. By eliminating 
the need for complex, cumbersome JCL, most system services became 
accessible to even the most naive users. Since the systems were open, the 


SVR4 Enhanced Security: A Technical Overview 


1 



operating system source code was published and well known. Frequently, the 
system source code resided on the system and was left accessible to all users. 
These features changed the type and nature of the security threats. For 
example: 

• Since the systems were open, system software could be obtained from 
sources other than the vendor, such as a university. The non-vendor 
software was not always placed through a rigorous design, 
development, test, and documentation cycle. As such, the non-vendor 
software sometimes contained holes through which rogue users could 
gain system access or system privilege. 

• The networked environment, combined with the interoperability of 
the UNIX system, enabled the development of virus programs which 
could affect the entire network of machines. Additionally, users could 
use the network to evade detection by having the virus program 
executed remotely. 

• Since the operating system source code was well known and easily 
available, it could be examined for holes through which access could 
be gained or privilege could be obtained. While holes existed on the 
proprietary systems, the lack of access to the source code prohibited 
thorough and extensive searches of the operating system. 

• The ability to share data among users within a designated group or all 
the users on the system proved to be a very powerful, frequently abused 
feature. For example, files were commonly left accessible to all 
system users, when allowing access at the group level would have been 
sufficient. Since this type of abuse was also seen with system files (for 
example, crontab files left with a mode of 777), a new type of threat 
was introduced. 

• Since no JCL was required to access system resources, users could 
frequently run the system out of space, out of processes, or tie up the 
system console by generating extraneous system messages. This 
enabled users to violate the system and prohibited the system 
administrator from taking preventive action. 

New threats such as those described above required more sophisticated 
security mechanisms than the identification and authentication provided on 
the mainframes. Early UNIX systems solved these problems by extending 
the group mechanism (multiple groups), restricting of network access, and 


2 


SVR4 Enhanced Security: A Technical Overview 



adding resource limits. While sufficient for many applications, these were not 
sufficient for high security applications such as those used by the United 
States government. Seeing the need for assured system security, the United 
States government, specifically the Department of Defense, began to address 
the security issues related to networked, resource-sharing computer systems. 


SVR4 Enhanced Security: A Technical Overview 


3 



Interest in Computer Security 


Interest in assured secure systems began within the United States government. 
As early as 1967, government recognized that resource-sharing computer 
systems brought a unique set of security problems that needed to be 
addressed. Growing out of efforts that largely came from Department of 
Defense (DoD) computer system needs, the National Computer Security 
Center (NCSC) was formed to identify problems and solutions for building, 
evaluating, and auditing secure computer systems. In 1983, the DoD 
published the definitive guideline to secure operating systems, the Trusted 
Computer System Evaluation Criteria (TCSEC), better known as the 
“Orange Book” because of its orange cover. The TCSEC contains the 
certification criteria that a system (software, hardware, and firmware) must 
meet for multilevel secure standards. A key requirement of the TCSEC at 
higher levels of trust is assured protection, that is, the system cannot merely 
claim to be secure, but must be proven to be so (via source code evaluation, 
covert channel analysis, penetration testing, and test suites). The assurance 
testing is performed by the NCSC against the security division claimed by 
the vendor submitting the system for evaluation. 

The NCSC evaluates commercial operating systems against the TCSEC and 
assigns security ratings to them. The TCSEC defines four security divisions 
(D C B, and A—from the least to the most secure) containing numerical 
classes’ (for example, Bl, B2, and B3—from least to most secure). The foil 
range of ratings currently defined is, from least to most secure, D, Cl, CZ, 

Bl B2, B3, and Al. The NCSC separates function and assurance at each 
level; as the level of security becomes more rigid, the need for security 
increases. 

Three U.S. government directives, National Security Decision Directive 
145, Department of Defense Directive 5200.28, issued in 1988, and the 
National Telecommunications and Information Systems Security 
Committee's NTISSP no. 200, issued July 15, 1987, define C2 as the 
minimum level of protection required for a wide range of government 
computer systems. In addition, government procurements generally include a 
security requirement based on the TCSEC at the Bl or B2 level. 


4 


SVR4 Enhanced Security: A Technical Overview 



International governments, fearing the U.S. government would become the 
predominant force in computer security, began to define their own security 
criteria. By the late 1980's, the Canadian, British, French, Danish and German 
governments were actively defining computer security standards, with the 
German and British governments publishing criteria and performing 
evaluations of secure systems. In Britain, the Department of Trade and 
Industry published the Green Book. In Germany, the German Information 
Security Agency published the White Book, while the French developed the 
Blue-White-Red Book. The German and British governments have published 
criteria and perform evaluations against their own criteria. An. outgrowth of 
the European security criteria work was the Information Technology Security 
Evaluation Criteria (ITSEC). The ITSEC was jointly developed by 
Germany, France, Britain, and the Netherlands and is being adopted by the 
European Economic Community (EEC) as the security criteria for a united 
Europe. 

While the various criteria have a great deal in common, there are differences 
as well. For example, the German White Book, while similar to the TCSEC, 
has some requirements above and beyond the TCSEC. One of these is a 
requirement to support Access Control Lists (ACLs) beginning at the B1 level 
(a B3 TCSEC requirement). The German criteria also separates function and 
assurance. 

Interest in computer security within the commercial and public sector 
resulted in the formation of special interest groups, standards organizations, 
and the definition of security requirements groups within the commercial 
sector. Banking groups within both the United States and Europe have defined 
requirements for more secure systems. A consortium headed by American 
Express and Electronic Data Corporation recently published the Commercial 
International Security Requirements (CISR). Consortiums of hardware 
vendors, such as Eurobit, have begun working toward the establishment of 
common security goals. 

This high level of interest in secure systems led to the creation of several 
standards organizations. They were chartered to define interfaces and data 
definitions for secure systems. The International Standards Organization's 
Joint Technical Committee (ISO/JTC), IEEE/POSIX™, X/Open®' and 
/usr/group (now UniForum™) all commissioned working groups in these 
areas. In addition to the standards activity, the NCSC-sponsored Trusted 
UNIX Working Group (TRUSIX) activity spawned task forces to work on 


SVR4 Enhanced Security: A Technical Overview 


5 



Discretionary Access Control (DAC), auditing, and security policy model 
development. 

In the user arena, American and international market interest in secure systems 
has grown immensely. At the recent ITSEC workshop, over 500 
representatives of users, governments, commercial interests, vendors, 
universities, and the public sector were present. The one common interest 
expressed by all participants was the need for increased security in computer 
systems. 


Government Security Needs 

Through the TCSEC, government needs and the validation/certification 
process are well defined and understood. National Security Decision 
Directive 145 establishes the need within the government for secure systems 
meeting or exceeding the C2 level. Requests For Proposals (RFPs) in the last 
few years have been written so that C2/B1 levels would meet the requirements 
at the time of requisition, but B2 would be required in the future. The recent 
United States Navy and Air Force RFP specifies a broad range of features 
including auditing (C2), Mandatory Access Control (MAC), and High 
Assurance (B2). 

Government agencies, like the Navy and Air Force, have requested security 
features beyond the C2/B1 level which provide auditing and minimal 
assurance. Most agencies are beginning to request systems with features at the 
B2 level, where additional security features provide penetration protection 
and increased data assurance. 

Many recent RFPs, such as the recent Joint Army and Navy (JSAN) request, 
have begun to specify Compartmented Mode Workstations (CMW). 
Specifically, a CMW is a B1-level workstation with B1+ extensions. The 
CMW requirements go beyond the TCSEC in requiring a trusted window 
manager and a second, MAC-like label known as a CMW label (also 
commonly referred to as an information label). 


6 


SVR4 Enhanced Security: A Technical Overview 



Faced with the prospect of four different evaluation criteria in a united 
Europe, France, Britain, Denmark, and Germany recognized that this work 
needed to be approached jointly and that harmonized evaluation criteria 
should be defined. Because the basic concepts and approaches of the four 
countries were much the same, they decided to take the best features of work 
already done and place them in a single set of criteria, the ITSEC. The intent 
of ITSEC is to have a set of common criteria for use within at least the four 
countries involved and eventually the entire EEC. So, evaluations performed 
in Germany could be honored in England. The ITSEC's single set of security 
evaluation criteria will, in turn, yield to a single set of well defined functions 
which would be required to satisfy both American and EEC RFPs. Before this 
can happen, the United States and the other EEC countries must buy into the 
ITSEC. Based on the level of comments received on Draft 1 of the ITSEC at 
the recent ITSEC workshop, much work must be done on this document 
before it can be considered equal to the TCSEC. Draft 2 of the ITSEC is 
planned for mid-1991 while a companion document, the ITSEC Evaluators 
Guide, is planned although no formal date for release, has been announced. 


Commercial Security Needs 

Within the last few years, the commercial sector has begun to recognize the 
need for increased security in computer systems. They have recognized also 
that, while government and commercial requirements had much in common, 
they varied in some of their security needs. With this in mind, such areas as 
financial institutions, telecommunications, and service industries began to 
develop their own profiles of security needs. Bell Communications Research 
(Bellcore), on behalf of the regional telephone companies, has been very active 
in the security arena and has developed its own security profile—the Bellcore 
Standard Operating Environment (SOE). American Express, in conjunction 
with a consortium of 40 other companies, developed the Commercial 
International Security Requirements (CISR). Within the commercial market, 
two basic factors hold true. First, market needs will evolve to more secure 
systems (as they become available); second, the features required by the 
TCSEC at the B2 level represent a super set of the security features required 
for commercial use. 

In most cases, the commercial sector has identified the need for increased 
computer security but has failed to map the need into the features required to 


SVR4 Enhanced Security: A Technical Overview 


7 



provide the security they want. One step toward defining these requirements 
has been the CISR. 

Contact with commercial users, either via standards organizations or trade 
shows, indicates that, as more secure systems become available, the market 
will evolve to them. For example, American and European interest in UNIX 
System V/Multilevel Security (SV/MLS), AT&T Federal Systems' Bl- 
evaluated product, has exceeded initial expectations. 

An emerging sector in the late 1980 s was the international market, 
specifically within the EEC. For example, the Commission of the European 
Communities recently published a set of requirements, the Security 
Requirements for Extended POSIX Computers, that defines a list of 
mandatory and optional requirements. These requirements must be present in 
a product before the EEC will consider purchasing it. Most of these 
requirements are equivalent to TCSEC B2 features. 

Functionality vs Assurance 

The major difference between the TCSEC and the ITSEC and between 
commercial and governmental specifications is the requirement for a high 
degree of assurance. The TCSEC and the United States government require 
that a system rating at the B level or above is secure functionally and provides 
high assurance (for example, extensive penetration testing and 
documentation). 

After the B1 level, the requirements for assurance become far more stringent. 
At the B2 level, requirements for modularity, covert channel analysis, and 
limits on the number of trusted processes and global variables are introduced. 
The higher levels, B3 and Al, introduce even more stringent requirements, 
requiring formal proofs of correctness. The ITSEC, following the lead of the 
German security requirements, separates function and assurance into different 
rating criteria. Thus, for example, a vendor could submit a system which 
meets stringent functional requirements (F-B3, for example) but with a far 
lesser degree of assurance (E3, for example). The same vendor requesting an 
evaluation against the TCSEC would have to submit the system at the B1 
level since it would not meet the B3 assurance criteria. 

The non-TCSEC-mapped classes are non-hierarchical and are defined as F- 
IN (high integrity), F-AV (high availability), F-DI (safeguarding data 
during interchange), F-DC (confidentiality of data during interchange), and 


8 


SVR4 Enhanced Security: A Technical Overview 



F-DX (network data interchange). Effectiveness (or assurance) classes range 
from EO (none) to E6 (high). Currendy, SVR4.1ES maps to the F-B2/E4 
ITS EC class. 

Within the commercial sector, the need for the high degree of assurance 
associated with the higher levels of trust, specified by the TCSEC, has not 
yet been realized. While the need for the features has been identified, the need 
for the assurance that the features have been properly implemented has not. 

The commercial sector has not realized that the increased modularity 
provided to meet the increased assurance yields great benefits in portability 
and maintainability, and that the covert channel analysis serves to close holes 
in the system security policy. One must question how useful MAC labels are 
when covert channels exist where data can be easily reclassified. 

The single biggest advantage B2 provides, when compared to other secure 
systems, is in the area of security assurance. Security assurance is, as the name 
implies, assurance that the system's security features work as advertised and 
are implemented properly. While all B levels require assurance, the level of 
assurance required at the B2 level is significantly greater than that required at 
the B1 level. At the B2 level, covert channel analysis, extensive penetration 
testing, and requirements on the number of global variables are introduced. 
Additionally, the system must meet stringent modularity requirements. 
While frequendy dismissed as TCSEC requirements with little commercial 
use, these security assurances provide significant value. The penetration testing 
and covert channel analysis provide assurance that the system has no back 
doors by which the systems security policy could be circumvented. 

Obviously, a secure system is worthless if it contains back doors that allow 
users access to the system, allow undetected access of data, or allow the 
security policy to be circumvented. Reducing global variables makes it far 
less likely that a change in one module will introduce changes in other 
modules. The modularity requirements yield a system which is easier to 
maintain and port. 

Covert Channel Analysis 

A covert channel is any means through which the systems security policy can 
be circumvented by an unauthorized user. Unlike most holes, undocumented 
covert channels are not audited; thus the exploitation almost always goes 
undetected. In high assurance systems, an extensive search for these channels, a 
covert channel analysis, is required. While many of the covert channels are 
obscure and hard to exploit, several are common, well known and exploitable 


SVR4 Enhanced Security: A Technical Overview 


9 



by the knowledgeable and persistent. A coven channel analysis identifies the 
channels such that the system vendor can eliminate the channel or make 
modifications to make the channel unusable. Thus by eliminating the methods 
by which the security policy can be circumvented the overall security of the 
system has been gready increased. 


The Role of Standards Organizations in Defining 
Computer Security 


In addition to government agencies that define security criteria, a number of 
standards bodies are also involved in evaluation of security criteria. These 
standards organizations have chartered groups to define standards for data 
interchange, secure interface specifications, and evaluation criteria. ISO is 
currendy working on specifications for the interchange of auditing data. IEEE 
POSIX has several groups working in the area of security. The primary group 
working on security, PI003.6, is defining application interfaces for auditing, 
MAC, CMW information labels, DAC, and least privilege. Other POSIX 
groups such as PI003.7 (system administration) and PI003.17 (network 
services) have spawned subcommittees and liaison activities to work with 
PI003.6 to resolve the issues related to security administration and secure 
networking. 

X/Open's Security Working Group (SWG) recently published the Security 
Interface Specifications: Auditing and Authentication snapshot document. This 
document, which was co-authored by AT&T/UNIX System Laboratories 
(USL) and IBM, was distributed for industry review late last year and has 
been adopted by POSIX PI003.6 as the base for its auditing specification. 
TRUSIX has developed and published B3-level specifications and guidelines 
for ACLs and auditing. The TRUSIX auditing guidelines use much of the 
X/Open auditing work, while the TRUSIX ACL specification is based 
heavily on the SVR4.1ES ACL implementation. 


10 


SVR4 Enhanced Security: A Technical Overview 



SVR4.1 Enhanced Security Background 


The UNIX system evolved in an open R&D environment where the free and 
easy exchange of information was paramount. Login accounts for guests 
without passwords, unprotected source and system files, and unrestricted data 
lines were typically found in these environments. From its inception, the 
UNIX system has always provided security features. 

The original UNIX system featured Discretionary Access Control, which 
provided the user with the ability to control file access by setting file 
permission bits defining the access permission for the owner, group, and 
others. The system also provided an identification and authentication (I&A.) 
scheme in which the system prompted the user for a system login identifier 
and an accompanying password. 

Unfortunately, to facilitate unrestricted access, these features were frequently 
circumvented. Prime examples of this included programs with setuid/setgid 
bits, file modes left as 777, login scripts complete with login and password, 
and unrestricted guest logins. On systems where stringent security measures 
were in force for ease of execution and debugging, security was frequendy 
compromised by setting applications to run with an effective UID of 0, thus 
making the application exempt from any system security requirements. 

Since early UNIX systems featured litde administrative support, the 
situation was made worse by administrative and operator errors which in turn, 
led to software holes through which hackers could gain unauthorized 
privileges. One example of this would be leaving the system's crontab 
writable by someone other than root. With the introduction of open networks 
came the ability to rapidly interconnect and freely share data among 
multiple systems. The combination of unprotected data files, holes by which 
privilege could be obtained, and easy and rapid interconnection, provided the 
ideal environment for virus attacks. Several recent attacks, such as the Internet 
worm and its well-known exploitation of system security holes, have caused 
renewed interest in the development and procurement of highly secure UNIX 
systems. 

USL's approach in addressing the definition of security features has been to 
work closely with UNIX International (UI) to identify needs and evaluate 


SVR4 Enhanced Security: A Technical Overview 


11 



functional requirements. A parallel effort has proceeded with government and 
industry leaders to establish standards through bodies such as IEEE POSIX 
and X/OPEN. Within these standards bodies USL has been an active 
participant and has been very successful driving these standards to be 
consistent with the SVR4.1ES functionality. 


Why B2? 

SVR4.1ES was designed to meet the rapidly increasing security needs within 
the UNIX System V community. The TCSEC B2 level seemed a natural fit 
for System V for several reasons. From a purely functional perspective, the B2 
level represents a superset of the features most desirable to government and 
commercial users. At the B2 level, MAC, ACLs, trusted path, and trusted 
facility administration are provided. The high level of assurance required to 
meet the B2 level is sufficient to satisfy the needs expressed by government 
and commercial customers. The higher levels (B3/A1) provide mostly 
increased assurance; little is provided in the way of additional function. 
However, functionality from the B3 and A1 classes was included in the 
SVR4.1ES feature set. SVR4.1ES TRUSIX-conformant ACLs and trusted 
facility administration are two examples of B2+ functionality included in 
SVR4.1ES to meet customer needs. Current and emerging RFPs and 
standards specifications (ITSEC, EEC security requirements, Navy and Air 
Force RFPs, and the CISR) all require functions at the B2 level. Most of these 
requirements also specify ACLs, a B3 feature. 

B2 represents what is likely the highest assurance level attainable by a UNIX 
system (barring a complete system rewrite) while having it still retain the 
classic UNIX system look and feel and maintain backward compatibility 
for existing applications. 

Due to the modular definition of features and the flexible packaging of 
SVR4.1ES, function levels from C2 to B2 can be defined. For example, a 
site may only install the auditing package on the SVR4.1ES base and have 
C2-level functions or they may install the B1/B2 package and configure the 
system to be either B1 or B2. 


12 


SVR4 Enhanced Security: A Technical Overview 



The Architecture of SVR4.1ES 


The SVR4.1ES operating system engineering improvements go beyond 
individual feature development; they involve changes in the structure and 
architecture of UNIX System V. These changes result in improved 
maintainability, performance, flexibility, and portability. Typically, though 
not always, these improvements will be visible only to system porters and not 
to end users or application developers. Thus, while such improvements may 
benefit end users and developers, they are of direct interest to UNIX System 
V source code customers who plan to port or modify the operating system. 

The UNIX system has been renowned as a modular, highly portable, 
operating system. To meet the exacting requirements on operating system 
modularity at the B2-level, however, the UNIX System V operating system 
has been further partitioned into modules. 

Modularity and Global Variable Reduction 

The increased modularity yields great benefits in portability and 
maintainability. The SVR4.1ES source code tree has been broken down into 
"common" and hardware-specific trees. Porting code is easier since the 
"common" code has been identified and separated leaving only the machine- 
specific code to be ported. Additionally this modular breakdown provides 
increased maintainability. Since the common and hardware-specific code are 
now separate, a bug in "common" code needs only to be fixed in a single 
place. Previously this "common" code may have been shared between several 
modules, thus the fix would need to be applied several times in several 
places. 

Global variables, variables whose context is spread across several modules, 
are often in places where a data value needs to be "shared" across several 
modules, but it is not practical or desirable to break the modules down into 
functions (and thus pass the data by function call). While easier to implement, 
global variables can sometimes make maintenance difficult; modification of 
a variable in one place may have cascading side effects because the variable is 
used in several other, often undocumented places. With most of the global 
variables replaced by module-specific variables, modification of a variable 
in one section of code is far less likely to produce unknown (and unwanted) 
change in other areas of code where the variables may have been used. 


SVR4 Enhanced Security: A Technical Overview 


13 



Improved modularity impacts more than the security feature. It improves the 
entire operating system and benefits all source code customers. Modular code 
is easier to interpret, maintain, and port. Since future SVR4 platforms such as 
Enhanced Security / Multiprocessing (ES/MP) are based on the SVR4.1ES 
source code, the effort required to port from SVR4.1ES to a future release is 
greatly reduced. 

The following sections briefly describe the major innovative security features 
found in SVR4.1ES: Identification and Authentication Facility (LAE), least 
privilege/trusted facility administration, MAC, DAC, trusted path, and 
auditing. 

Identification and Authentication Facility (IAF) 

This facility provides a framework for modular replacement of 
identification and authentication schemes. SVR4.1ES provides two different 
types of schemes: the familiar login/password scheme and a new bilateral 
scheme called crl (Challenge-Response 1). The crl scheme implements a 
simple challenge-response protocol which requires each system to store a 
cryptographic key for each system with which it will communicate, crl 
supports the Basic Networking Utilities (BNU) and the remote system 
administration facilities (see subsection Networking for a brief discussion of 
BNU. 

Least Privilege/Trusted Facility Administration 

Originally, UNIX systems were small and non-networked. The single¬ 
privilege concept worked well because the systems were typically 
administered and maintained by a single administrator. As UNIX systems 
became more complex, the tasks of operation, administration, and 
maintenance were spread among several persons acting in specialized roles. 

For example, a system operator would be responsible for normal system 
operations while a system administrator would be responsible for system 
maintenance. Modifications to the operating system would be done by the 
systems programmer. In a computer center, environment machine usage 
charges would be determined by a system auditor. An auditor site security 
officer might be charged with reviewing the system's audit/accounting trail to 
determine if any suspicious activity had taken place. Clearly, the tasks 
executed in these roles are quite different and the level of trust placed in the 
individuals also varies. For example, the person examining system accounting 
is clearly not trusted to shut down the system or examine other user's files. 


14 


SVR4 Enhanced Security: A Technical Overview 



The single-privilege root model does not lend itself well to a role-based 
scheme. 

The alternative is the concept of least privilege, or the ability to execute a 
task (such as mounting a file system) with the least or minimal amount of 
privilege required to successfully execute the task. This is clearly beneficial 
in situations where the individual executing a defined task is not trusted to 
execute a wide range of tasks. For example, the duties of the operator may be 
restricted to routine maintenance, such as setting the system date and time. 
The operator is not permitted to view or modify sensitive system files such 
as /etc/shadow. Following this example, the operator should only be 
permitted to execute a well-defined set of commands with privilege. 
Commands outside the defined role of the operator should not be permitted 
or, if permitted, should execute without any special privilege. Since the 
UNIX system historically has provided a single privileged identity, that of 
root (assigned User Id [UID] 0), all tasks requiring use of privilege are 
executed as root. Thus, the ability to delimit responsibility based on roles is 
difficult. Workarounds, such as restricted shells, work to a certain extent, but 
since the restricted environment is itself executing as root, they are always 
vulnerable and susceptible to attack. 

The SVR4.1ES Least Privilege/Trusted Facility Administration feature 
splits the single privileged role of root into well-defined, less-powerful roles. 
The SVR4.1ES system provides roles for system operator, security operator, 
site security officer, and auditor. The feature was designed to be flexible; 
sites can add additional roles or extend the existing roles as needed. Each 
role has a well defined set of privileged operations that the user in the role is 
permitted to execute. For example, the system operator is permitted 
privileged execution of the date command to set the system date and time, 
but is not permitted privileged execution of the ed command to view or 
modify sensitive system files. What follows is a brief description of how the 
SVR4.1ES least privilege/trusted facility administration works and how it 
eliminates some of the more commonly exploitable flaws in the single¬ 
privilege mechanism. 


SVR4 Enhanced Security: A Technical Overview 


15 



In SVR4.1ES, a process has a maximum and working set of privileges 
associated with it. The maximum set represents the most privilege the 
process could ever attain, and the working set represents the minimum set of 
privileges required to execute the task. An executable file may have 
associated with it an inheritable or fixed set of privileges. A inheritable 
privilege is a privilege that is kept (that is, left “turned on*) only if it 
already existed in the process. A fixed privilege is a privilege that is always 
given to the process, independent of the previous process privileges. When a 
file is exec'ed these sets are computed as illustrated in Figure 1. 


exec 



(1) intersection of maximum set of privileges (2) union of the results of (1) with the file's fixed 

of the invoking process with the file's inheritable privileges 

privileges 


Note: The fixed and inheritable privilege sets are disjoint; a privilege cannot be present in both 
sets at the same time. 


Figure 1. Computing Privilege Sets 

For compatibility with the current UNIX system setuid mechanism, 
SVR4.1ES supports the concept of fixed file privileges. When a file that has 
fixed privilege is executed, those privileges are added (unioned) to the 
maximum privilege set of the invoking process; this forms the maximum and 
working privilege sets for the resulting process. Note that the fixed 
privileges are not added to the maximum or working privilege sets of the 
invoking process. 


16 


SVR4 Enhanced Security: A Technical Overview 






( 


I 


For example, if a site determined that all users should be able to execute the 
ps command and not be subject to mandatory or discretionary access control 
checks, the administrator would use the filepriv command to set the 
pDACread and p_MACread privileges as fixed privileges. Any user 
invoking ps would then acquire the p_DACread and p_MACread privileges 
for the duration of the execution of the ps command. 

Trusted Facility Management (TFM) 

The trusted facility management (tfadmin) facility redefines the way in 
which the role/privilege assignment mechanism works. In current UNIX 
systems, an administrator will login (or su) to an administrative identity. 
The administrator assumes all file access rights (and privileges in the case of 
root/UID 0) associated with the identity. All subsequent processes assume 
these privileges. With this in mind, there are several scenarios by which the 
vulnerabilities of the system may be exploited. For example, logged in as 
root, the administrator invokes 

$ date 010191 (set system date & time) 

$ mail 

The administrator assumes that /bin/date and /bin/mail are being executed. 
However, since a full pathname was not specified, the administrator is relying 
on the PATH variable being properly set such that the correct commands are 
executed. 

If the user's path searches $HOME/my.bin, and a malicious user was able to 
plant rogue versions of date and mail in $HOME/my.bin, then the 
administrator would unknowingly have executed Trojan horse versions of 
these commands and would have given root access away. 

An additional problem is caused by the inheritance mechanism of exec. Since 
all the attributes associated with the root identity are passed to child 
processes via exec, all processes invoked by the administrator execute with 
privilege, regardless of need. This often results in execution of code not 
expected to run with root privilege and not designed with trust in mind. This 
is especially dangerous with commands that in turn execute other commands 
or that feature escapes to the shell. For example, an administrator escapes to 
the shell from mail and executes cat. Since mail was running as root, the cat 
command was also executed as root. If a rogue version of cat was executed, 
root privilege has inadvertently been given away. 


SVR4 Enhanced Security: A Technical Overview 


17 



With tfadmin there are no privileges inherent with a single-user identity, 
rather privileges are associated with a defined role; user identities are then 
associated with the role. Privilege is acquired by executing tfadmin. The 
tfadmin command has an administrative database associated with it. The 
database contains entries in the following format. 

role:alias:command:privilege(s) 

For example: 

secadmin:date:/bin/date:p_sysops 

User identities are then placed in roles; for example, the user Elroy could be 
placed in the role secadmin. 

The entry described above allows the user, Elroy, in the role of secadmin, to 
execute the command /bin/date with the p_sysops privilege. For ease of use, 
the secadmin role can use the alias date in place of typing in /bin/date, which 
could become tedious. 

Based on the previous example, consider the following example: 

$ tfadmin date 010191 
$ mail 

Upon execution, the tfadmin command searches its database for an entry for 
date in the role invoking tfadmin. If a match is found, the command is 
executed (via its fully qualified pathname) only with the explicit privileges 
needed to perform the requested operation. In this case, only the sysops 
privilege is needed to set the date. This is then the only privilege passed to 
the process executing date. The next command, mail, requires no privilege to 
run; therefore, execution via tfadmin is unnecessary. Because tfadmin will 
only associate privilege with a defined entry, the command 

tfadmin mail 

would fail because no database entry would be defined for mail. 

Mandatory Access Control (MAC) 

In order to meet customer needs for high data integrity, Mandatory Access 
Control labels have been added to SVR4.1ES. With the addition of MAC, 
all processes, files, and interprocess communication (IPC) objects must have 


18 


SVR4 Enhanced Security: A Technical Overview 



a security label. The DAC mechanism allows permissions to be set at the 
discretion of the owner of an object; the owner of a file is able to determine 
who can (and cannot) access the file. On the other hand, the system 
administrator sets the MAC mechanism and the system enforces it. The file 
owner does not set the initial MAC label and is unable to change it. 

The SVR4.1ES mandatory access control policy follows a modified Bell- 
LaPadula model [2] that can be summarized as "read equal or down" and 
"write equal." Assume a situation where "top secret" dominates "secret" and 
"secret" dominates "unclassified." Naturally, each level can write to only to 
files on its own level; but a process at level "top secret" can read files at each 
class, since it dominates them all. A process at the "secret" level, however, 
would be able to read files only at "secret" and "unclassified" levels. 

Administrators are responsible for determining and setting up the discrete set 
of labels at which a user can log in as well as setting the login level range on 
terminal lines. The login level range restricts system access such that when a 
user attempts to log in, the label the user specified must dominate the login- 
low label on the terminal line and in turn be dominated by the login-high 
label on the terminal line. For example, a terminal line with a login level 
range of "secret" to "top secret" would be unaccessible to a user at 
"unclassified" (since "unclassified" does not dominate the login-low level 
secret ). 

By default, SVR4.1ES supports 256 classifications and 1,024 categories, 
though the system can be configured to support up to 65,535 classifications 
and 2,097,152 categories. For reasons of disk space and performance, 
SVR4.1ES implements MAC labels with an indirection scheme. Each named 
classification/category tuple (that is, fully qualified label) is associated with 
a unique level identifier also known as an LID. The LID serves as a system 
pointer to the fully qualified label name and is the value stored in the inode. 
For reasons of user convenience, each fully qualified label may be assigned an 
alias name. The alias name is a short hand representation of the fully qualified 
label. For example, the alias for the label 

Top Secret:projectA,projectB 

may be 
TS 


SVR4 Enhanced Security: A Technical Overview 


19 



Access Isolation 


The kernel uses the LID as the primary method of label reference. When the 
kernel is requested to check access, the LIDs involved in the access 
determination are compared. If write access is requested, the LIDs 
themselves are simply compared (since the system enforces a policy of write 
equal and the LIDs are guaranteed to be unique). For example, if write access 
to a file with a lid of 10045 is requested by a process with an LID of 10045, 
access is granted since the LIDs are equal. However, if write access is 
requested to the same file by a process with an LID of 10046, access is denied 
because the LIDs are not equal. The system supports a policy of read down, so 
the access check required for a read operation requires an additional step. 

Since no hierarchy can be determined by the comparing two LIDs (that is, 

LID 10046 is not guaranteed to dominate LID 10045), the binary 
representation of the fully qualified labels of the two LIDs needs to be 
compared. For reasons of system performance, the binary representation of the 
labels is kept in a cache; its size is a system tunable that may be increased or 
decreased as required. For example, if a read operation was requested to a 
file with an LID of 10045 by a process with an LID of 10046, the system 
would do the following: 

• Check to see if the binary representations of the LIDs to be compared 
were already in the cache. 

• If the binary representations of both LIDs were not in the cache, the 
system would read the LID database and bring the binary 
representation of the LIDs into the cache. 

• The binary representations of the LIDs would be compared to 
determine if a dominance relationship exists (that is, read access). If 
so, access would be granted. If not, access would be denied. 

An additional form of data integrity, access isolation, has been achieved 
through the judicious use of mandatory access control levels. By setting up a 
label hierarchy such that user-defined labels are disjoint (that is, they do not 
dominate) from system-defined labels, the system is partitioned. Users are 
prohibited (via MAC) from reading, modifying, or executing sensitive 
system files, and administrators are protected from inadvertently executing 
un-trusted code. Figure 2 illustrates how such a lattice may be defined. 


20 


SVR4 Enhanced Security: A Technical Overview 



User 


System 


\ 

\ 



\ 

Figure 2. Access Isolation Mechanism 

In the lattice depicted above, the levels USER_PUBLIC and 
USER_LOGIN are defined for non-administrative use. The level 
USER_PUBLIC is defined for non-administrative user files and commands 
(emacs, databases, etc.,). The level USER_LOGIN is defined for non- 
administrative system access; by default, all non-administrative users access 
the system at this level. The levels SYS_PUBLIC, SYS_PRIVATE, 
SYS_OPERATOR, and SYS_AUDIT are defined for administrative and 
system use. The level SYS_PUBLIC is defined for files/commands 
accessible to administrators and users (such as, mail, mount, date). The level 
SYS_PRIVATE is defined for administrative system access and is not 
accessible by non-administrative users. The level SYS_AUDIT is reserved 
for storing the system audit trail. 

Considering the lattice defined above, the commands date and mail would 
be labeled at SYS_PUBLIC. Since both the user and system portions have 
read access to data labeled at SYS_PUBLIC, both administrators and users 
have execute permission for these commands. Since the user does not have 
write permission at the SYS_PUBLIC level (MAC restricts write access), a 
user cannot plant a Trojan horse at this level. Note that, since the level 
SYS_PRIVATE dominates SYS_PUBLIC, the administrator does not 
require either mandatory or discretionary override privilege to access these 
files. Thus, the administrator executing these commands does not have 
mandatory access control override permissions and may only execute 
commands and read files at levels dominated by SYS_PRIVATE. Since the 
administrator at SYS_PRIVATE does not dominate either USER_PUBLIC 


SVR4 Enhanced Security: A Technical Overview 


21 


or USER_LOGIN, and does not acquire the privilege required to circumvent 
mandatory access control, the administrator is protected from invoking 
Trojan horse programs planted at this level by users. 


Discretionary Access Control 

SVR4.1ES provides two complimentary DAC mechanisms: UNIX system 
file permission modes and TRUSEX-conformant Access Control Lists 
(ACLs). The UNIX file permission modes are retained from previous 
releases of UNIX System V for compatibility. Users already familiar with 
UNIX system file permissions will find that this mechanism still works as 
expected. 

The SVR4.1ES ACLs are designed to satisfy the B3-level Orange Book 
requirements while still retaining compatibility with the UNIX system file 
mode scheme. The SVR4.1ES ACL mechanism allows for finer control than 
do existing file permission bits. It provides the owner of an object the 
ability to grant or deny access by other users to the granularity of a single 
user. 

For convenience, SVR4.1ES ACLs also allow access rights to members of 
groups (as defined to the system in the /etc/group administrative file). ACLs 
can also be arbitrarily large; that is, the number of ACL entries is not limited 
by the system. The system administrator can set the maximum number of 
entries per ACL by setting a tunable parameter. (Naturally, as ACLs get 
larger, processing gets slower, which induces a practical limit on the number 
of ACL entries.) 

In SVR4.1ES, an ACL is associated with every file system object and IPC 
object. ACLs for file system objects are stored in the associated inode; the 
first seven entries are stored in the inode, and other entries are stored in 
indirecdy-referenced disk blocks. ACLs for IPC objects are stored in an 
internal structure associated with the instantiation of the IPC object. 

An ACL contains all the DAC-access information for the object with which it 
is associated. For the sake of compatibility, file permissions are displayed as 
usual in the expected situations, and operations on files behave as they would 
be expected to on any UNIX System V-based operating system. Flowever, in 
SVR4.1ES, file permission bits are actually translated into and stored as 
ACL entries. The ACL entries, which are derived from the file owner, file 


22 


SVR4 Enhanced Security: A Technical Overview 



owner group, and other permission bits, are called base entries. Permission 
can be granted or denied beyond the base entries by including additional 
ACL entries. The concept of the file group class ACL entry is critical to 
understanding SVR4.1ES ACLs. Historically, the middle three permission 
bits have been used to indicate the permission granted to the file owner's 
group. In SVR4.1ES, when additional ACL entries are added, these middle 
three bits, the file group class, are used to represent the maximum permission 
granted by an additional ACL entry. This is done for compatibility with 
existing applications, which depend on stat (as opposed to access), to 
determine access permission. A simple SVR4.1ES ACL would appear as 
follows. The numbers in parentheses are used to indicate the association 
between the permission bits, owner and group, and the ACL. They do not 
appear in SVR4.1ES ACLs. 


(4) (5) (6) (2) 

rwxr—xr—x+ 1 f re d 

#file: run.sh (1) 

#owner:fred (2) 

#group: demo (3) 

user::rwx (4) 

user:larry:—x 
group::r—x (5) j 

groupisys:- 

class.r—x 

otherir—x (6) 


(3) (1) 

demo 73 Jan 6 20:27 run.sh 


orbing these entries provides 
class entry 


Notes: + sign indicates file has an associated ACL 

The class entry is always equal to the group permission bits. Thus, stating the 
file provides the maximum permission granted by the ACL 

Figure 3. A Simple SVR4.1ES ACL 

In the ACL depicted above, the first three entries (prefixed with #) indicate 
the name of the file (run.sh), the owner of the file (fred) and the file owner's 
group (demo). The entries for owner, file owner group, and other are derived 
from the source file's permission bits. The user and group entries with null 
ID fields represent the permissions of the owner and owner's group. The 
entries are NULL so that if the file is given away, the ACL entries (owner and 


SVR4 Enhanced Security: A Technical Overview 


23 




group) are still correct. The class entry was computed by oring the additional 
ACL entries for larry, sys and file group class 1 . 

An ACL consists of the following types of entries, which must be in the 
following order: 

• user entry - This entry is derived from the file owner permission bits; 
it contains a user ID and the permissions associated with it. There is 
always one entry of this type. It represents the object owner and is 
denoted by a null (unspecified) user ID. There may be additional, 
unique user entries. 

• group entry - This entry is derived from the file group permission 
bits; it contains a group ID and the permissions associated with it. 
There is always one entry of this type. It represents the object owning 
group and is denoted by a null (unspecified) group ID. There may be 
additional, unique group entries. 

• other entry - This type of entry contains the permissions granted to a 
subject if none of the above entries have been matched. There is exactly 
one of these entries in an ACL. 

• class entry - This type of entry contains the maximum permissions 
granted to the file group class. There is exactly one of these entries in 
the ACL. The class entry indicates the maximum permission allowed 
by the ACL. Additionally, this entry acts as a mask and provides 
compatibility for existing applications. These applications obtain file 
access permission via stat and attempt to change file status via chmod. 
For example, see Figure 4 


1 The SVR4.1ES ACL mechanism treats the file owner group entry as an additional ACL entry (that is. not a base 
entry). 


24 


SVR4 Enhanced Security: A Technical Overview 



Before chmod 000 


After chmod 000 


After chmod 755 (reset mode bits) 


rwxr—xr—x+ 


—♦ 

rwxr—xr—x+ 

#fiie: run.sh 

#file:run.sh 


#file: run.sh 

#owner: fred 

#owner: fred 


#owner: fred 

#group: demo 

#group: demo 


#group: demo 

user::rwx 

user::- 


user::rwx 

userrlarry:-x 

user.larry:-x 


userrlarry:-x 

group::r—x 

group::r—x 


group::r—x 

grouprsys:- 

group:sys:- 


grouprsys:- 

classir—x 

class:- 


class:r—x 

otherrr—x 

other:- 


other:r—x 


Figure 4. Modification of Mode Bits and ACL Using chmod 

• default entry - This type of entry may only exist on a directory. These 
entries are similar to the entries described above, except that they are 
never used in an access check. Rather, they are used to indicate the user, 
group, and other ACL entries that should be added to a file created 
within the directory. 

Referring to the example above, notice that the ACL entries for file owner, 
other, and file group class are changed to reflect the intended setting of the 
permission bits (via chmod). No additional ACL entries are modified. The 
intended effect of the chmod 000 is accomplished by using the file group 
class entry as a mask. Note that chmod did not modify the file owner group 
entry. This is because the SVR4.1ES implementation treats the file owner 
group as an additional ACL entry. 

Trusted Path 


The SVR4.1ES trusted path feature provides a direct communications path 
between the user and the system such that all sensitive information entered by 
the user is in fact being transmitted to the system. Before gaining access to the 
system, the user must enter a key sequence, the Secure Attention Key (SAK). 
When the SAK is recognized by the system, a login prompt is generated. 

This differs significantly from historical applications of login where 
initiation only requires the user to enter return. The SVR4.1ES trusted path 
feature provides protection against both system access attacks and Trojan 
horses. 

A common form of system access attack is performed by programs which 
randomly dial numbers, wait for a tone, and then wait for the system to 
generate a login prompt. These programs then try combinations of login and 
passwd to attempt to gain system access. By requiring the user attempting to 
gain system access to enter a key sequence before generating a login prompt, 


SVR4 Enhanced Security: A Technical Overview 


25 



many simple access programs, keyed on the login prompt, are defeated This 
is because the login prompt never appears. More advanced programs have yet 
another layer of protection (the SAK) to penetrate. 

A common form of Trojan horse associated with login involves planting a 
malicious program that masquerades as a login. A malicious user will log 
into the system and start a program which itself poses as login. This program 
then sleeps until an unsuspecting user attempts to gain access to the system. 
The rogue login program then reads the user's login and password, records 
them, and then exec's a version of the real login program. The unsuspecting 
user has no idea his password has been given away. The SVR4.1ES trusted 
path facility defeats this type of attack by killing all active processes 
associated with the tty on which the SAK was entered. Thus, in the case of the 
rogue login program, the process posing as login is killed, and the user 
communicates with the true login program. 

By default, the SAK is a line drop, although the administrator can configure 
it to be a character or asynchronous line condition, such as a break. 

The SVR4.1ES trusted path feature works as follows: 

1. A user requesting access to the system enters the SAK. 

2. The system identifies the SAK before any line discipline is applied. 

3. On detecting the SAK, the trusted computing base (TCB) 
terminates any current login session, permanently puts open 
connections in a state such that they can no longer be used for 
terminal I/O, and eventually reinitiates the login sequence. 

4. If login is not completed within the login timeout period, the login 
program will enter a mode where login interaction cannot proceed 
until the SAK is entered again. 

Auditing 

Hand in hand with the ability to penetrate system security is the ability to do 
so without detection. On most UNIX systems the only record of process 
execution is the information saved by the UNIX system's process accounting 
facility. While this data provides some insight into what may have occurred 
on the system, it can be spoofed and does not provide sufficient data to fully 
determine an intruder's actions. Additionally, existing UNIX process 
accounting provides no granularity; it is an all-or-nothing feature. Either 


26 


SVR4 Enhanced Security: A Technical Overview 



accounting is enabled for all (known) events, for all users, or it is completely 
disabled. Since recording accounting data is done on an all-event, all-user 
basis, a lot of system resources are expended. For this reason, it is frequently 
not used. 

These shortcomings have been corrected in SVR4.1ES with the addition of 
system auditing. Like accounting, auditing records events that occur on the 
system. However, in addition to simply recording the occurrence of events, 
auditing also records the parameters associated with the events and the 
outcome of the events. 

Granularity is provided at both the event and user level; that is, the 
administrator can select specific events which will be audited and can specify 
the users for whom those events are audited. Since the system's audit daemon 
runs with a MAC level disjoint from all defined user levels, the presence of 
the audit daemon (that is, the ability to detect auditing) is undetectable by 
unprivileged users. SVR4.1ES provides an audit mechanism capable of 
recording and reporting on all security-related events that occur on the 
system. All security-related events that occur on the system can be audited, 
including those events identified as being associated with covert channels. 

SVR4.1ES associates most audit events with a system call. For example, the 
mk_dir and rm_dir events map to the mkdir and rmdir system calls. Since 
system administrators tend to think in terms of system events, SVR4.1ES 
provides the concept of an event class. The class mechanism allows for a 
logical grouping of event types. For example, the mk_dir and rm_dir events 
fall into the dir_make class. Since auditing tends to generate large amounts 
of data, and since an administrator may wish to select most but not all the 
event types within a class, SVR4.1ES permits selection by both event type 
and class. Additionally the selections can be intermixed (a class may be 
selected and one or more types within the class may be turned off). 

Since a certain subset of applications may wish to add records to the audit 
trail, the SVR4.1ES audit feature provides the ability for applications to add 
their own free-format records to the audit trail. Multiple site or application 
records may be defined. These added records can be selected and later 
reported using the standard SVR4.1ES selection and reporting tools. 

Events deemed critical to the integrity of the system (that is, events critical 
to the integrity of the audit trail) are always audited whenever auditing is 
enabled, regardless of the system-wide and per-user event masks. These events 


SVR4 Enhanced Security: A Technical Overview 


27 



are called fixed events. Other events are auditable at the discretion of the 
system administrator. They are called selectable events. 

As stated above, events may be set on either a system-wide or per-user basis. 
System-wide events are selected by the administrator with the auditset 
command, auditset may also be used after auditing is enabled to specify 
additional events to be audited, or to unselect events that no longer require 
auditing. 

Per-user audit masks may be designated for each user by using the useradd 
command. These masks are permanent—whenever auditing is enabled and the 
user is logged on, events specified in these masks will be audited. The set of 
fixed events, along with the system-wide and per-user audit masks are or’ed to 
form the user's process audit mask. 

Each auditable event, when audited, generates an associated audit record. 
Collected for each event audited are a time stamp, the user identity, object 
name, level of the process (subject) causing the event, privileges used, an 
identification of the type of event, and an indication of the success or failure 
of the event. Other information specific to the event type is also collected. 
The auditrpt command is used to select, format, and print data from the log 

file. 


Networking 

The features described above are all part of the SVR4.1ES Trusted 
Computing Base (TCB), currently undergoing NCSC evaluation. In its 
evaluated configuration, SVR4.1ES is a stand-above system; no network 
capability is present. In many cases, such a configuration is not practical or 
even realistic. To accommodate environments which require high-security 
operating system support and interoperability, several modifications have 
been made to the System V networking utilities for the new security features 
present in SVR4.1ES. While not part of the evaluated product, these features 
allow secure network communications within a homogeneous environment. 

A number of significant modifications were made in Release 4 which 
expanded UNIX System V's networking capabilities. These included 
improvements to existing capabilities that added new features such as the 
connection server, the identification and authentication facility, and ID 
mapping. The network enhancements made to SVR4.1ES continue to build 


28 


SVR4 Enhanced Security: A Technical Overview 



on this foundation by adding the ability to interconnect with both secure and 
non-secure systems. Note that the ability to interoperate within the same 
security domain is only possible when interconnecting between SVR4.1ES 
systems. 

Basic Networking Utilities (BNU) 

Enhancements were made to BNU for both security and remote 
administration. Remote administration is a new capability that allows a set 
of machines to be administered from a single site. Additional enhancements 
were made to BNU to allow it to take advantage of the security functionality 
added with SVR4.1ES, specifically MAC. The enhanced BNU allows files 
to be transferred between two SVR4.1ES systems retaining the same MAC 
label (for example, Top Secret to Top Secret) or, the MAC label can be 
mapped (for example,Top Secret can be mapped to Very Secret). As with 
all the networking utilities, connections can be established to both secure and 
non-secure systems. 

When used with these networking features, the enhanced uucp capabilities 
enable system administrators to perform remote file transfer and execution 
tasks while maintaining system security. Additionally, uucp has been 
redesigned to facilitate use of the connection server and identification and 
authentication features. 

Open Systems Interconnect (OSI) 

The full OSI protocol stack has been implemented in the kernel and uses the 
STREAMS mechanism as a base. Tools are provided for the manipulation of 
stacks and data. The protocol stack and tools of the implementation conform 
to the Government Open Systems Interconnect Profile (GOSIP). Also 
included are tools to assist in the migration from TCP/IP to OSI. 

Distributed File Systems 

UNIX System V Release 4 (SVR4) introduced support for the two most 
widely used distributed file systems, the Network File. System (NFS) and 
Remote File Sharing (RFS). 

• NFS provides the interoperability and robustness for a network of 
computers running different operating systems. 


SVR4 Enhanced Security: A Technical Overview 


29 



• Both RFS and NFS provide the central administration and control of 
integrated File systems for a network of computers all running UNIX 
System V. 

Additionally, SVR4 provided a set of common administrative utilities to 
manage both NFS and RFS. 

RFS and NFS are protocol-independent and can run over different networks. 
Diskless operation is supported as well, and the requirement for dedicated 
root partitions for diskless clients was eliminated in SVR4. 

Also in SVR4, NFS could be configured to use secure Remote Procedure 
Call (RPC) for increasing the security of the network. For SVR4.1ES, more 
stringent authentication schemes have been implemented for RFS and NFS to 
provide support for the B2 security features. These enhancements include 
support for MAC, least privilege, and auditing. NFS and RFS provide 
support for MAC label mapping. As was the case for BNU, NFS and RFS 
support the ability to interconnect within a homogeneous and heterogeneous 
environment. However, the ability to interpret both MAC labels and 
privilege sets is supported only within a homogeneous environment. 

NFS and RFS provide distinct, largely complementary services. Both are 
implemented within the virtual file system (VFS) framework, and both are 
administered with a unified set of administrative commands. Generally, a 
binary program works transparently with any remote or local file system. 

RFS allows machines running UNIX System V to selectively share 
directories containing files, subdirectories, or devices across a network. In 
Release 4, the RFS feature allowed the server machine to authenticate a client 
machine when the client attempted to establish a virtual circuit. To strengthen 
this mechanism, stronger authentication schemes may be specified when a user 
attempts to establish a virtual circuit with a server. 

NFS allows machines to share files and directories selectively across a 
network. 


Functionality on the Horizon 

The following is a brief summary of some functionality under consideration 
for inclusion in future releases of UNIX System V. 


30 


SVR4 Enhanced Security: A Technical Overview 



Secure X/CMW 


One of the most rapidly expanding security markets is the Compartmented 
Mode Workstation (CMW) market. In simple terms, a CMW system is a 
"Bl" level operating system with a trusted X server and an Information Label. 
(An Information Label is a MAC-like label assigned to data. It has the 
unique property of "floating" as data is added to the file.) 

Aside from meeting the demands of the CMW and trusted X markets, CMW 
requirements add a variety of features such as an enhanced trusted path 
facility, real-time auditing alarms, and a highly secure software distribution 
facility. Currently, SVR4.1ES is "CMW-ready"; this means that space for the 
Information Label has been reserved in all internal data structures. By 
reserving the space beforehand, binary compatibility between the CMW and 
non-CMW versions of SVR4.1ES is maintained. 

Enhanced Cryptographic Support 

As the size and speed of networks grows, so does the threat that sensitive data 
can be accessed by unauthorized users, or that data may be corrupted or 
compromised. A simple, reliable method of ensuring data security is 
encryption, the transformation of text into a cipher to conceal its meaning. 

The non-physical nature of the electronic media makes it necessary to attach 
some form of encoding to the data in order to properly identify the source, 
authenticate the contents and provide privacy against electronic surveillance. 

In small, internally controlled networks, this type of protection is easily 
provided through a combination of the standard UNIX system crypt facility 
(to protect the data from disclosure) and simple checksum (to indicate 
modification of the data). The standard UNIX system tools utilize private 
key encryption technology. Private Key, as the name implies, requires that the 
encryption key remain private for the data to be secure. Since the private key 
is required to both encrypt and decrypt the data the key must obviously be 
shared between the parties exchanging the data. As networks grow in size and 
become spread over multiple administrative domains, exchanging the 
encryption key in a secure manner becomes more and more difficult. 

In addition to the problems involved with sharing a private key, the DES 
algorithm used by System V is on the restricted products list of the U.S. 
Government, and any product containing DES cannot be exported. Due to 
these problems, USL has been looking at ways of implementing public key 
technology in System V. Public Key technology is well suited to large, 


SVR4 Enhanced Security: A Technical Overview 


31 



public networks since the key used to encrypt the data is public and may be 
freely shared without compromising the security of the encrypted data. 

Kerberos 

With the need for network security increasing, the Kerberos authentication 
system from MIT's Project Athena is being touted as the answer to network 
security problems. While installation of Kerberos will make a network more 
secure and represents a significant step toward increasing the overall level of 
security within a network, it is not by itself the complete solution. With this 
in mind, USL developed the Network Applications Architecture (NAA). 
The NAA will use Kerberos technology as an integral component. The 
version of Kerberos to be included in the USL NAA, as well as the USL 
defined extensions to Kerberos, are under investigation. 


32 


SVR4 Enhanced Security: A Technical Overview 



Summary 


This paper has presented a brief overview of the computer security 
marketplace and defined how SVR4.1ES is situated in this market. The 
market for secure computer systems is here now and will continue to grow 
well into the 1990’s. The market for these systems is governmental and 
commercial, domestic and international. Comparison to available secure 
systems indicates that SVR4.1ES is the only system available or in 
development which provides the features, assurances, and flexibility required 
to meet the security needs of the commercial and governmental markets 
worldwide. 


SVR4 Enhanced Security: A Technical Overview 


33 



Appendix A: Published Security Specifications 


Only a few well known and reviewed security specifications are currently in 
the pubic domain. The predominant security specification is still the 
TCSEC. In Europe, it is the ITSEC. In the standards arena, TRUSIX, IEEE 
POSIX, and X/Open have all developed specifications. Following is a brief 
description of how SVR4.1ES meets the specifications listed above. 

TCSEC SVR4.1ES is currendy in formal evaluation at the 

B2 level. The system meets all the requirements at 
the B2 level and exceeds the B2 level with its access 
control lists and trusted facility manager. 

ITSEC The ITSEC provides a mapping of ITSEC 

requirements to the TCSEC. SVR4.1ES meets the 
ITSEC F-B2/E4 level. 

POSIX (EEC) SVR4.1ES meets all of the requirements listed as 

Mandatory and 90 percent of the requirements 
listed as Optional. 

POSIX PI003.6 

The IEEE POSIX PI003.6 committee is developing a security interface for 
portable applications. When complete, PI003.6 will become an addendum to 
PI003.1. Since PI003.6 must be translated into a language-independent 
format (it is currendy C bindings) before approval as an ISO standard, it is 
expected that approval will take two to three years. SVR4.1ES will conform 
to PI003.6. Early analysis indicates that the majority of the functionality 
required for conformance could be achieved at the command/library level 
with few operating system modifications required. 

Trusted UNIX (TRUSIX) Task Force 

In 1987, the NCSC formed the TRUSIX to provide technical assistance to 
vendors and evaluators involved in developing TCSEC class B3 UNIX 
systems. The TRUSIX group produced rationale and a worked example in 
the following areas: 


34 


SVR4 Enhanced Security: A Technical Overview 



• auditing (rationale only, document is currently out for industry 
review) 

• access control lists 

• formal model (review cycle complete final status of document is 
under determination) 

AT&T/USL was the primary contributor to the TRUSIX access control list 
document, and as such, TRUSIX is entirely consistent with the SVR4.1ES 
implementation. The formal model was developed specifically for use in 
UNIX system evaluations and is the model being used in the SVR4.1ES 
evaluation. The TRUSIX audit document contains rationale only. The 
rationale contained in the document is entirely consistent with the SVR4.1ES 
audit implementation. 

X/Open 

The X/Open Security Working Group(SWG) chose the C2 level of trust as 
the target for their work. The group focused exclusively in the area of 
identification and authentication and auditing. Traditionally, the X/Open 
SWG has had the lead in defining security auditing interfaces. Before the 
group became dormant, they produced one document (the X/Open Security 
Interface Specification: Auditing and Authentication which was adopted by 
POSIX PI003.6 as the base for their auditing specification. The X/Open 
auditing specification was co-authored by ATT/USL and IBM®. While the 
interfaces reflect functionality currently present in SVR4.1ES, they also 
represent functionality requested by various UNIX International special 
interest groups and by USL’s customers. 


SVR4 Enhanced Security: A Technical Overview 


35 



Appendix B: References 


[1] Department of Defense. Trusted Computer System Evaluation 
Criteria, DOD 5200.28-STD, December, 1985. 

[2] Bell, D. E. and LaPadula, L. ]., Secure Computer System: Unified 
Exposition and Multics Interpretation, MITRE Corporation, MTR- 
2997, March 1976. 


36 


SVR4 Enhanced Security: A Technical Overview 



Glossary 


ACL 

Access Control List 

BNU 

Basic Networking Utilities 

CISR 

Commercial International Security Requirements 

CMW 

Compartmented Mode Workstations 

crl 

Challenge-Response 1 

DAC 

Discretionary Access Control 

DoD 

Department of Defense 

EEC 

European Economic Community 

ES/MP 

Enhanced Security/Multiprocessing 

GOSIP 

Government Open Systems Interconnect Profile 

IAF 

Identification and Authentication Facility 

ISO/JTC 

International Standards Organization's Joint Technical Committee 

ITSEC 

Information Technology Security Evaluation Criteria 

JCL 

Job Control Language 

J SAN 

Joint Army and Navy 

LID 

Label Identification 

MAC 

Mandatory Access Control 

NAA 

Network Application Architecture 

NCSC 

National Computer Security Center 

OSI 

Open Systems Interconnect 


SVR4 Enhanced Security: A Technical Overview 


37 



RFP 

Request For Proposals 

RFS 

Remote File Sharing 

RPC 

Remote Procedure Call 

SAK 

Secure Attention Key 

SWG 

X/Open's Security Working Group 

TCSEC 

Trusted Computer System Evaluation Criteria 

TRUSIX 

Trusted UNIX Working Group 

VFS 

Virtual File System 


38 


SVR4 Enhanced Security: A Technical Overview 



Transaction Processing Comes to the UNIX® System 


Chris Schoettle 
UNIX System Laboratories 
Australia and New Zealand 
Tel: +61-2-906-8953 
Fax: +61-2-436-4673 
Email: cts@ usl.com 


Juan M. Andrade 

Mark T. Carges 

UNIX System Laboratories 


Copyright ® 1992 by UNIX System Laboratories 


® UNIX and TUXEDO are registered trademarks of UNIX System Laboratories, Inc. in the US and other countries. 
X/Open is a trademark of X/Open. 


1 





Transaction Processing Comes to the UNIX® System 

Chris Schoettle 

UNIX System Laboratories 

Australia and New Zealand 

c/- AT&T Australia 
Level 11, 60 Margaret Street 
Sydney NSW 2000 
Australia 

Tel: +61-2-906-8953 
Fax: +61-2-436-4673 
Email: cts@ usl.com 

ABSTRACT 

Because of current downsizing and rightsizing trends, many traditional 
proprietary On-Line Transaction Processing (OLTP) users are looking to the 
UNIX® System as a more affordable solution for their business-critical 
application needs. However, there is concern about the UNIX System’s ability 
to meet the complex technical needs of OLTP applications. How do the current 
open OLTP products provide equivalent functionality and how do they actually 
surpass the functionality of proprietary OLTP monitors? 

This presentation will establish a Transactional Client/Server® (TCS) 
computing model that enables business-critical applications on UNIX System 
platforms. Based on a request-response paradigm, the TCS model brings 
application throughput, transactional function shipping, data integrity, load 
balancing, prioritization, data security and management of complex database 
systems to the UNIX System. 

Use of the TCS model also allows properties of traditional OLTP to be realized 
within a distributed architecture. Distributing applications across 
heterogeneous hardware platforms, operating systems, networks, databases and 
front-end interfaces creates new demands for open OLTP systems software. 
The transaction processing manager must, in a transparent fashion, enable and 
coordinate application activity, be it transactional or not, across machine and 
data boundaries. The TCS model will be presented within a distributed 
environment and the features needed in the accompanying API addressed. 

Issues of hardware, software, and networking integration will also be discussed. 
With hardware, the OLTP solution must integrate heterogeneous platforms that 
span an enterprise, from the workstation to the mainframe. With software, the 
OLTP solution must enable end-user choice of front-end interface (from GUI to 
bar code reader), resource manager (from RDBMS to printer) and operating 
system, (from UNIX System to proprietary), and do so in a mix-or-match 
fashion. With networking, the OLTP solution must provide the transparent use 
of several network fabrics. 



1. INTRODUCTION 


One important factor leading to the emergence of open (e.g., UNIX®-based) on-line transaction 
processing (OLTP) systems is X/Open’s work on distributed transaction processing (DTP). 
X/Open has recently published a reference model for distributed transaction processing 
[XOMDL91] that defines how an application program, a transaction manager, and one or more 
resource managers cooperate to achieve application portability and component 
interchangeability in a distributed transaction processing environment. The model clearly 
points out how application programs can obtain the benefits of these goals via a set of interfaces 
published in its specifications. 

It is worth noting that X/Open’s reference model is not an architecture for a specific system or 
product. Rather, the model provides a framework upon which a standard set of application 
programming interfaces (APIs) and integration interfaces may be specified. This framework also 
provides a basis for interoperability between heterogeneous TP systems and a foundation upon 
which real OLTP systems can be built. 

Thus, the components of the X/Open reference model are necessary, but not sufficient, for 
constructing commercially viable OLTP products. Some readers familiar with X/Open’s model 
have difficulty seeing the relationship between the logical components of X/Open’s reference 
model and existing open OLTP products. The confusion exists because X/Open’s model is not 
an architecture for a particular product. Also, open OLTP systems address a far greater 
number of requirements than those of the reference model. 

This paper discusses the relationship of the X/Open DTP model to a particular open OLTP 
product, the TUXEDO® Enterprise Transaction Processing (ETP) System [TUXED091]. It also 
presents an overview of the X/Open model followed by an examination of the requirements for 
an open OLTP environment. During this examination the components of the X/Open model 
that are relevant to an open OLTP environment are underscored. To illustrate how the 
requirements for open OLTP can be met in light of the X/Open DTP model, the TUXEDO ETP 
System will be described. 

2. X/OPEN DTP MODEL OVERVIEW 

The goals of the X/Open DTP model [XOMDL91J are to provide portability of transaction 
processing applications, interchangeability of DTP software components, and interoperability of 
DTP applications across heterogeneous platforms (software or hardware). These goals are 
realized by a software architecture whereby an application program can access several resource 
managers under the coordination of a single transaction manager. Figure 1 illustrates the 


2 


functional components of the X/Open DTP model. 


Application Program (AP) 


(l) Resource 
Manager Interfaces 


I 


(2) Transaction 
Demarcation Interface 


Resource 
| Managers Ll 
(RMs) n 


| Transaction 
Manager 

(TM) 


(3) The XA Interface 


Figure 1 . Functional Components and Interfaces 

Application Program Component 

The application program (AP) component is the software that implements the end-user’s desired 
functionality. For example, in a banking application, the code implementing the debit and 
credit logic resides in one or more AP components. In the X/Open DTP model, the definition of 
an AP is restricted to a runtime process that directly accesses a transaction manager and a set 
of resource managers. Thus, while a complete transaction processing application will consist of 
many application processes, only a subset of these fit the X/Open DTP model’s definition. More 
precisely, it is only those processes that directly use the interfaces numbered 1 and 2 shown in 
figure 1. Application processes performing administrative tasks, for example, are currently 
outside the scope of the X/Open DTP model. 

Transaction Manager Component 

The transaction manager (TM) component manages the transactions declared by the .AP and 
coordinates the completion of transactions across a set of resource managers. A TM is a 
specialized software component responsible for: assigning transaction identifiers, coordinating 
the two-phase commitment protocol among a set of resource managers, and driving recovery 
activities after failures. Note that a TM, as defined in the X/Open DTP model, is usually just 
one component of a system software package called a transaction processing monitor (TPM). 

A TM offers to an AP a small set of verbs for declaring transaction boundaries. This interface, 
labeled number 2 in figure 1, contains verbs for beginning, committing and rolling back a 
transaction. As of the writing of this paper, X/Open has not published an interface 
specification for these services. A TM uses the XA interface [XOXA91], labeled number 3 in 
figure 1, to exchange transaction information with a set of resource managers. An AP’s use of 
the transaction demarcation verbs drives the TM ? s use of the XA verbs to a set of RMs. 
Whereas an AP delimits transaction boundaries at a high level, the TM uses the XA interface to 
implement the two-phase commit protocol with a set of resource managers. Thus, both the XA 
interface and the two-phase commitment protocol are transparent to the AP. 

Resource Manager Component 

The resource manager (RM) component manages a particular shared resource on the 
application’s platform. Because DTP applications need many different types of resources, even 
within a single transaction, the model shows an AP directly accessing several different RMs. 
Examples of resource managers are SQL RDBMSs, ISAM file managers and client/server 
communications managers. Like a TM, some types of RMs are usually just one component of a 
complete TPM. For example, a client/server, transaction-oriented communication API might be 
part of a TPM, whereas a relational DBMS might not. 


3 











With the XA interface, "DTP-ready" RMs can be plugged-in to provide the services an 
application needs. For an RM to be DTP-ready, it must not only support transaction properties 
and the XA interface but it also must not complete transactions independently from a TM. 
That is, in the X/Open reference model RMs are subordinate to a TM and must follow a TM’s 
instructions (via the XA interface) when either committing or rolling back a transaction. This 
ensures that all RMs participating in an application’s transaction remain consistent. 


Application Code 


■SQL 


DBMS 

Library 


XA 


Transaction 
Demarcation — 
API 

Transaction 
Manager Library 


XA 


Client/Server. 
API 


Client/Server 

Communication 

Library 


Figure 2. A Process Structure 

In terms of a process, an AP within the X/Open DTP model uses the APIs of a TM library for 
transaction demarcation and those of a set of RM libraries for communication and resource 
access. [XOMDL91] also refers to such a process as an ‘'instance of the model.” Figure 2 shows 
an example of a process linked to the libraries of a TM, a client/server communications RM, and 
a DBMS. 


TM Domains 

The X/Open DTP model groups a set of processes that use the same instance of a TM into a 
TM domain. There may be several TM domains within an enterprise and each TM domain is 
independently administered. A TM domain can have processes that span several physical nodes. 
Domains are defined in terms of a TM, as opposed to an RM, because of a TM’s governing role 
in extending basic transaction properties across heterogeneous software components. 

The processes within a single TM domain communicate amongst themselves via a 
communications RM. In addition, to allow processes within one TM domain to communicate 
with those in another, a TPM might also have another communications RM that supports TP 
protocols used for interoperability, such as OSI-TP [OSITP91]. Therefore, it is important that 
the APIs for transaction demarcation and communication be able to map to and from both the 
intra-domain as well as the inter-domain protocols. 

3. OPEN OLTP REQUIREMENTS AND THE X/OPEN DTP MODEL 

As OLTP systems and applications are being deployed on open platforms, the list of 
requirements for OLTP system providers is growing. Not only are many of the traditional 
requirements, such as supporting hundreds of users performing short interactive tasks, still 
valid but there are also many new requirements specific to open systems environments. The 
rest of this section explores many of these requirements. Where possible, requirements that are 
addressed by the X/Open DTP model are explicitly pointed out. 

Open OLTP systems have been driven by industry needs to support decentralized applications 
in order to decrease high computing costs. These systems must be able to handle distributed 
and heterogeneous configurations of software, hardware and networks. Also, they must provide 
mechanisms for the integration of multiple tiers of processing, from PCs to Mainframes. Such 
integration requires a consistent set of APIs across a broad span of computing environments, 
and also the coordination of transactions across multiple heterogeneous RMs. 


4 









The X/Open DTP Model addresses the issues of consistent APIs and some level of component 
interchangeability. However, open OLTP systems still require a great deal of flexibility to 
integrate other required software provided by different vendors. What follows is a list of 
requirements that OLTP system providers must meet for open systems platforms: 

Freedom of choice - vendor independence. The main thrust behind open OLTP systems is that 
customers have the freedom to choose the parts of their computing environment based on their 
needs. This includes the hardware they buy, the way in which it is networked together, and the 
different software components, such as the particular DBMS or TPM, that they choose to put on 
it. Proprietary OLTP systems cannot provide this freedom since all the components are 
bundled together in a single package. A big challenge for open OLTP system providers, then, is 
to allow application builders to have choices but to also give them the high performance that 
the bundled proprietary systems have traditionally delivered. 

The X/Open DTP model goes a long way to give application builders the freedom to choose 
some of their software components via the XA interface. With the XA interface, they can 
choose the TM and the RMs they prefer. This will most likely come in the form of a set of XA- 
compliant DBMSs and a TPM. However, there are certain software components that the XA 
interface does not help integrate: systems configuration and administration, runtime 
monitoring, and security. Each TPM and DBMS will have their own way of supporting these 
features until a standard set of integration APIs are defined for these areas. 

Application portability. Open OLTP application developers need the same application 
development environment across different hardware and software platforms. This requirement 
calls for a standard set of APIs that span different computing bases (that is, from the PC, to the 
UNIX server, to the proprietary mainframe). These APIs allow application developers to take 
advantage, for example, of different graphical user interfaces (GUIs) across dissimilar platforms 
while preserving the same OLTP transaction and communication semantics. The X/Open DTP 
model is helping to meet this requirement by committing to publishing APIs (as shown in figure 
1 on the interfaces numbered 1 and 2) that application writers and tools builders can expect to 
be present on open systems platforms. 

Distribution transparency. Communications paradigms in OLTP systems must present location 
transparent communication services to application programmers. That is, one application 
program communicating with another should not know nor care where its partner is located. As 
systems become more distributed, the need for location transparency becomes greater. 

A related requirement for transparency is system-provided data marshaling: since application 
data may be received on a processor with a data representation that differs from the sender’s, 
open OLTP products should transparently convert application data to the appropriate 
representation when necessary. Because there are different de facto and de jure marshaling 
techniques (e.g., XDR and ASN.l), open OLTP systems should allow application builders to 
plug in their preferred method without requiring application program changes. 

Lastly, open OLTP systems should also provide context-sensitive routing. This type of routing 
allows OLTP requests to be transparently routed based on application data values. Context- 
sensitive routing is particularly useful in applications requiring the same processing to be 
performed on related data, stored at different locations. For example, a banking application 
might store all odd-numbered accounts in a database at one site and all even-numbered 
accounts in another database at a different site (so as not to have all its account in a single 
place). When a request for an account balance is made, it is transparently routed to the correct 
site based on the value of the account. The application program is completely unaware that 
such routing is taking place. 


5 



Performance , modular growth , and scalability . OLTP applications require high throughput and 
short response times. To achieve these goals in an open OLTP environment, OLTP systems 
must provide an architecture that allows an application to be decomposed into modules that 
can execute in parallel. Such an application architecture, or computational model, must allow 
for the application modules to run on heterogeneous computing bases. Application modules 
must also be able to be dynamically added to a running system as well as taken out of it. 

Connectivity and interoperability. Open OLTP systems must allow different computing bases to 
be connected according to application builders’ needs. Networks with different transport 
interfaces (e.g., XTI, Sockets, Netbios) and protocols (TCP/IP, X.25, OSI) must be supported. 
This support must be provided while offering APIs that are independent of the particular 
networking fabric used. 

X/Open will be extending the XA interface to allow different DTP-ready communications RMs 
to be plugged into an open OLTP environment. This alternative would allow applications to 
incorporate existing programs built using the APIs of proprietary communications managers 
into an open OLTP application. 

Interoperability among disparate TP environments (open or otherwise) is an important 
requirement. From an investment protection standpoint, open OLTP systems must 
interoperate with existing proprietary systems. As different open OLTP products emerge in the 
marketplace, there is also a requirement for interoperability via standard TP-oriented protocols 
(namely, OSI-TP). The X/Open DTP model’s TM domain concept provides the framework for 
tying together heterogeneous OLTP systems using standard protocols. 

Reliability , robustness, and reconfigurability. In a distributed environment, an open OLTP 
system must be resilient to failures and it must provide mechanisms to prevent application data 
from becoming inconsistent. The latter may occur when distributed requests are processed in 
parallel and some parts fail. OLTP systems provide the concept of a distributed transaction to 
ensure that data at all sites remains consistent. Distributed transactions also simplify the 
construction of reliable distributed OLTP applications that access shared data. Distributed 
transactions are at the heart of the X/Open DTP model and via the XA interface OLTP 
application developers are given the freedom to choose the DTP-ready RMs that best suit their 
needs. 

A reliability requirement of open OLTP systems that is not addressed by the X/Open DTP 
model is the provision for automatically restarting parts of an application that fail. These 
systems should also provide the ability to migrate or reschedule the failed part of a distributed 
application to available sites. 

Security. Distributed environments require authentication of users and authorization of services 
to protect business data from intruders or unauthorized use. Different applications require, 
however, different ways of providing security (e.g., by isolating certain nodes or by using 
software provided by different vendors). For this reason, open OLTP systems should allow 
application builders to incorporate their preferred security mechanisms. Even though the 
X/Open DTP model currently does not address this issue, open OLTP products must recognize 
the importance of security in many business environments. 

Monitoring and administration. OLTP systems must allow systems administrators to tune the 
load distribution of their application, and to schedule the priority of work. In distributed 
environments, there is also an interest in automating the start-up and shutdown of an 
application (or parts of it). Such automated support should be controlled from a central place. 
That is, OLTP systems administrators need to have a centralized view of a distributed 
application. 


6 


4. OVERVIEW OF AN OPEN OLTP SYSTEM 


To illustrate how the requirements of open OLTP can be met in light of the X/Open DTP 
model, the TUXEDO ETP System will be described. First, a high level description of the 
TUXEDO ETP System distributed architecture is given to show the various computing bases 
that an open OLTP system must span [DWYER91]. Next, to motivate how the diverse 
requirements of open OLTP are met with a simple but powerful software architecture, the 
TUXEDO ETP System computational model is described. Lastly, the individual components of 
the TUXEDO ETP System architecture are described. 

The TUXEDO ETP System distributed architecture allows a set of heterogeneous computer 
nodes to cooperate towards the efficient and reliable execution of OLTP applications. Figure 3 
depicts the different computing bases on which the TUXEDO ETP System runs (e.g., 
workstations and UNIX-based server nodes). It also illustrates that it interoperates with other 
OLTP environments, perhaps dissimilar from its own (e.g., IBM MVS/CICS). 


TUXEDO ETP Svstem Domain 



Figure 3. TUXEDO ETP System Distributed Architecture 
The TUXEDO ETP System distributed architecture has the following main components: 

• System/T nodes. System/T is the TPM of the TUXEDO ETP System [ACK89]. System/T 
nodes are the fundamental nodes in a TUXEDO ETP System application. They contain the 
parts of the application that use X/Open X\-compliant RMs and the services of the 
System/T TPM to link together different parts of a distributed application. Among other 
things, System/T contains an X/Open XA-compliant TM that coordinates application- 
defined transactions distributed across different nodes. Because the TUXEDO ETP System 
runs on a variety of open systems platforms (i.e., different variants of the UNIX operating 
system), the computing base that a System/T node is defined on is provided by many 
platform vendors. 

• Workstation nodes. Applications can offload CPU cycles used for forms processing (or other 
input activities) from System/T nodes to workstation nodes, thus improving the overall 
performance of an OLTP application. Across the different nodes of a TUXEDO ETP System 










application, programs are written using a single API (called ATMI) to structure the work of 
the application. Therefore, applications can be easily moved from one node to another. 

TUXEDO ETP System applications running on workstation nodes are built as instances of 
the X/Open DTP model (similar to that shown in figure 2). They use the System/T TPM, a 
combined X/Open communications RM and TM, to define distributed transaction 
boundaries and to communicate with other parts of the OLTP application that provide 
access to critical business data (some of which may be stored in X/Open DBMS RMs). 
Because workstation nodes typically serve a single individual and not a large community of 
users, they do not contain DBMS RMs (many of the workstation nodes are, in fact, diskless). 

Applications running on workstations nodes are requesters of OLTP services offered on other 
System/T nodes. The TUXEDO ETP System supports PCs running MS-DOS, OS/2, and 
workstations running various versions of the UNIX operating system. Workstations nodes 
can be connected to System/T nodes using many types of networks (e.g., TCP/IP, OSI, 
Netbios). 

• Heterogeneous TP nodes . These are nodes controlled by other (non-System/T) OLTP 
systems (e.g., IBM MVS/CICS). From the standpoint of a TUXEDO ETP System 
application, heterogeneous TP nodes are viewed as other TM domains in the X/Open DTP 
model. The X/Open DTP model states that the communication between two heterogeneous 
TM domains is based on the OSI-TP protocol. However, as this protocol is still evolving and 
there are no generally available implementations of it, inter-TM domain communication is 
based on proprietary TP protocols, such as IBM SNA LU6.2. In fact, the TLXEDO ETP 
System /Host feature allows interoperability between a TUXEDO ETP System domain and 
IBM MVS/CICS via SNA LU6.2. 

Applications using ATMI, the TUXEDO ETP System programming API, on workstation or 
System/T nodes gain transparent access to services offered on heterogeneous TP nodes. In 
addition, as open TP protocols between TM domains become standard and widely available, 
transactions defined by these applications will be able to span heterogeneous TP nodes. 

In a distributed, heterogeneous environment, as described above, a computational model that 
simplifies the development of an open OLTP application and hides the complexity associated 
with distribution application processing is required. 

Client/Server: A Computational Model for Open OLTP Applications 

A client/server model is ideal for building OLTP application in distributed environments. This 
approach consists of splitting the application into two types of software components, clients and 
servers. Clients gather input from terminals (or special devices) and construct service requests. 
These service requests refer to operations implemented within a server. Hence, a server’s main 
function is to process clients’ service requests (perhaps according to some priority), and send 
them replies indicating the outcome of their requests. 

The client/server model has important properties for OLTP applications: 

• Modularity. It provides a modular approach to building distributed applications. Such 
modularity allows for extensible and scalable applications that can grow with an enterprise’s 
needs. 

• Performance. By conserving system resources (e.g., many clients can be served by a few 
servers), the client/server model permits applications to maintain short response times and 
high transaction throughput. 

• Data independence. Clients have to know very little about the structure of servers. They 
need to know only the name of the service they want performed and the parameters that 


8 


service expects. Because the service routines hide the actual data access methods, clients 
have no idea what kind of DBMS is being used. In fact, the DBMS accessed within a server 
can be changed with little or no affect on the clients. 

• Location transparency. Because of the separation between client and server processes, 
clients communicate with servers using abstract, location independent names. This allows 
servers to migrate across computing bases depending on resource needs with no affect on 
clients. To maintain the location transparency of servers, it is critical that clients do not use 
network names or addresses when communicating with servers. 

The Client/Server Computational Model within the TUXEDO ETP System 

The TUXEDO ETP System exploits the client/server computational model towards the goal of 
enabling high performance OLTP applications conforming to the X/Open model. Because of the 
special needs of OLTP applications, the TUXEDO ETP System recognizes two programming 
paradigms within the client/server model: request/response and connection-oriented processing. 
These two paradigms are provided through an API, called ATMI, that was designed to meet the 
requirements of open OLTP. In terms of the X/Open DTP model, ATMI is the API to the 
communications RM component of the System/T TPM. ATMI also contains verbs that map to 
the TM component of the System/T TPM for defining the boundaries of distributed 
transactions (see figure 2). 

The request/response verbs of ATMI eliminate the need for a client to explicitly bind to a 
server. Such bindings compromise location transparency and automated fault resilience (e.g., 
transparently migrating service requests to other servers in the case of failures). ATMI’s 
request/response verbs provide support for synchronous and asynchronous service requests. 
Asynchronous service requests allow several requests to be executing in parallel and are 
particularly well suited to distributed environments. In conjunction with the TM verbs of 
ATMI, several requests can be grouped together within a single transaction. Performing service 
requests within transactions simplifies application logic since work is automatically rolled back 
when errors or failures occur. Application programmers also have control over the priority at 
which a service request is handled by a server. Because of its high level of abstraction, ATMI 
handles data presentation services such as transparently encoding/decoding application data as 
it passes between nodes having different processor types. Services defined using ATMI can 
return replies to clients, or forward work on to other servers that take over the responsibility of 
sending the originating client its reply. This "bucket-brigade" style of communication is well 
suited to distributed environments as it keeps a high percentage of servers busy, and working in 
parallel, during peak periods. 

While the request/response paradigm is simple and powerful, it is not ideal for applications that 
require either bulk data transfer (e.g., sending very large amounts of data) or incremental 
results (e.g., sending a handful of DBMS records at a time back to a client). These two styles of 
communication are best performed by a connection-oriented paradigm that allows a client and a 
server to communicate more than just a single request and response. Part of ATMI, therefore, 
is an interface allowing clients and servers to communicate using a simple, half-duplex 
connection-oriented paradigm while maintaining the premise of location transparency. 

Together, both the request/response and the connection-oriented paradigms supported in ATMI 
help open OLTP application developers structure their code for portability, location 
transparency, performance, modular grow'th, and reliability. 

Architecture of a System/T Node 

A closer look at the System/T TPM, which runs on the System/T nodes of figure 3, will show in 
depth where the components of the X/Open DTP model can be found within a particular 
transaction processing monitor. The System/T TPM provides the fundamental services 


9 



required by open OLTP applications. In addition, the core of a distributed OLTP application, 
the application services, also resides on these System/T nodes. 


Workstation Nodes 



Figure 4. The Components of a System/T Node 

TUXEDO ETP System applications are structured according to the client/server computational 
model described in the previous section. Thus, application builders define clients and servers 
using the application development facilities provided by the TUXEDO ETP System. Servers 
may or may not be structured to use XA-compliant DBMSs. Clients use ATM1 to structure 
their work into transactions that may or may include the work done by the servers with which 
they communicate. 

In support of application clients and servers, the System/T TPM coordinates the activities of 
the other components on a System/T node. System/T provides the OLTP services that fulfill 
the requirements for open OLTP systems discussed in section 3. In particular, System /T 
provides the services shown in figure 4 and described in detail below. 


10 






















































Distributed function shipping. Client processes do not access an enterprise’s data directly. 
Rather, the System/T TPM provides for distributed function shipping whereby a client’s 
request and parameters are sent to a server that can process the request near the enterprise’s 
data. As of the writing of this paper, such services are not yet addressed by the X/Open DTP 
model and its APIs. 

Naming services. System/T uses its naming services to provide location transparency for service 
requests. Clients use abstract names, such as "DEBIT,” to call on a server. System/T’s naming 
services are responsible for resolving these abstract names and routing (see below) the request to 
the appropriate server. System/T’s naming services contain information about application 
services located within a TUXEDO ETP System domain as well as about those known to be 
located in other TM Domains. X/Open has not yet addressed naming services for transaction 
processing. 

Priority scheduling , context-sensitive routing , and load balancing. An application service can 
have associated with it a set of conditions that help System/T choose, for any given request, a 
server that can best service the client’s request. System/T evaluates context-sensitive routing 
criteria, load factors and service priorities when deciding where a request should be sent. To 
uniformly distribute work throughout a system, System/T keeps usage statistics and uses a 
two-level load balancing scheme. The first level chooses a System/T node based on context- 
sensitive routing criteria and usage statistics while the second level takes advantage of a 
multiple-server, single-queue mechanism to ensure that the first available server will process a 
client’s request. Routing, scheduling and load balancing are not addressed in X/Open’s model. 

System administration , monitoring , and reconfiguration services. System/T provides TP 
administrators with a set a capabilities for application administration that include: transaction 
monitoring, on-line reconfiguration, starting and shutting down components of the application, 
and scheduling service requests at certain times of the day. System/T’s administration services 
provide a global view of a distributed OLTP application from a central point. Centralized 
logging of activities, statistics and errors allows administrators to closely monitor an OLTP 
application. Also, application configuration services allow administrators to centrally specify all 
machine dependencies, networking characteristics, service and server parameters. Such services 
are currently not addressed in the X/Open DTP model. 

Administration between different TM domains is a particularly interesting area in which 
standards are greatly needed. For example, OSI-CMEP could eventually be used but it would 
first require the definition of a standard application context that includes the description of the 
different managed entities of an OLTP system. Such protocols are not yet addressed in the 
X/Open DTP model. 

Fault management services. System/T increases the availability of an application by allowing 
servers and services to be replicated across several System/T nodes. Also, System/T monitors 
the viability of the different components in a System/T node, such as machines, networks and 
processes. When a failure is detected, automatic recovery or automatic server re-start is 
performed. System/T also provides an automated facility for server migration to a backup 
System/T node when a node goes dow r n. The fault management services also allow both 
System/T and application programs to record errors and activities in a central audit log. Such 
services are currently not addressed in the X/Open DTP model. 

Security services. System/T’s security service authenticates a client’s identity before allowing it 
to perform work in an application. While this mechanism is based on Kerberos, it was designed 
such that application builders can plug in their preferred mechanism if they wish to augment or 
replace the mechanism provided by System/T. This can be accomplished by supplying an 
application-specific security mechanism that meets a published System/T internal interface. 
Security is currently not addressed in the X/Open DTP model. 


11 



Data encoding/decoding services. System/T provides a data presentation service (known as 
typed buffers) with ATMI. Applications communicate with typed buffers and System/T 
transparently encodes/decodes typed buffers when necessary. Applications can create their own 
typed buffers and define the encoding/decoding mechanism that shall be used by System/T 
when the typed buffer is transferred to a machine of a different processor type (i.e., one having 
different data representations). As this feature is related to the API an application programmer 
uses, it is addressed by the X/Open DTP model under the area of communications RM APIs. 
To date, no such API has been published by X/Open. 

Distributed transaction management services . These services perform distributed transaction 
management through the use of a two-phase commitment protocol and logging service 
[HESSEL90]. These services map directly to the concept of the TM in the X/Open DTP model. 
System/T DTP services use the XA interface to coordinate transactions with DTP-ready RMs. 
These services also coordinate transactions across TM Domains via the Inter-Domain services. 

Request queuing and scheduling services . These services allow applications to store requests in a 
stable queue and to control their delivery to application servers at a later time. Some of these 
service requests can be clocked services, this is, services that are scheduled to be executed at the 
same time every day. These services are currently not addressed in the X/Open DTP model. 

Data entry services . The data entry service provides efficient forms management for 
asynchronous terminals connected to UNDC system sites. These services are responsible for 
translating the data entered by an end user into specific service requests and sending those 
requests to servers. The data entry service is a TUXEDO system-supplied client that uses 
ATMI. As such, it uses ATMI to define transaction boundaries around the service requests that 
it issues. Data entry services are currently not addressed in the X/Open DTP model. 

Network handling services. This component simplifies remote communication with other 
System/T nodes. System/T provides a UNIX system process, called the System/T Bridge, 
which is in charge of this remote communication. A System/T Bridge routes a distributed 
service request to the System/T node and server where the service request should be executed. 
The Bridge’s access to a network is abstracted through a special networking library that allows 
uniform handling of different networking interfaces (e.g., X/Open’s XTI, Sockets, APLI, 
Netbios). This interface is a published System/T internal interface. 

Workstation handling. System/T allows applications to migrate client code to PCs or 
workstations, and, therefore, remove the character processing overhead associated with UNIX 
system asynchronous terminals. In addition, it allows application builders to utilize the 
graphical user interfaces on the platforms of their choice. System/T manages numerous 
workstations with a specialized System/T client that multiplexes the processing required for 
each workstation. This client, called a workstation handler, is built upon System/T’s published 
networking abstraction. It therefore can manage different types of networks and networking 
protocols. The workstation handler uses other System/T services to route service requests 
generated by workstation or PC client applications to the appropriate servers. PCs are not 
specifically included (nor are they precluded) in X/Open’s DTP work. 

Inter-domain services. System/T provides interoperability with nodes located in another TM 
Domain. The TUXEDO ETP System currently provides access to applications located in an 
IBM MVS/CICS environment through a special handler that uses the X/Open CPI-C interface 
to SNA LU6.2. Eventually, interoperability among heterogeneous systems will be provided via 
the OSI-TP protocol. The X/Open DTP model states that X/Open will provide an interface to 
the services of the OSI-TP protocol to facilitate such interoperability. 

The above list illustrates how the requirements specified in section 3 for open OLTP systems 
can be met while adhering to the concepts in the X/Open DTP model and its .APIs. 


12 


( 


( 


5. SUMMARY 

This paper discussed how the X/Open DTP model components and APIs fit into the broader 
context of an open OLTP environment. It also pointed out those open OLTP requirements that 
still need to be addressed by X/Open or other standards bodies. 

The X/Open work has provided a good basis for the emergence of open OLTP products in the 
market place. X/Open’s efforts to provide portable DTP applications, interchangeable DTP 
components, and interoperable DTP applications across heterogeneous platforms are the 
necessary first steps towards the formal definition of open OLTP systems. Much work, however, 
remains to be done to provide a complete framework for open OLTP environments. 

The TUXEDO Enterprise Transaction Processing System was used in this paper to show how 
the X/Open DTP model components are incorporated into a commercially viable OLTP 
product. 

REFERENCES 

[ACK89] J. M. Andrade, M. T. Carges, K. R. Kovach, "Building a Transaction Processing 
System on UNIX Systems", UniForum Conference Proceedings, February 1989, 
San Francisco, CA. 

[DWYER91] T. J. Dwyer, "Enterprise Transaction Processing", UniForum Conference 
Proceedings, January 1991, Dallas, Texas. 

[HESSEL90] M. R. Hesselgrave, "Considerations for Building Distributed Transaction 
Processing Systems on UNIX® System V", 1990 UniForum Conference 

Proceedings. 

[OSITP91] ISO/IEC, "Distributed Transaction Processing - Part 3: Protocol Specification", 
ISO/IEC 10026-3, JTC 1/SC 21 N, Project 1.21.34, Draft, May 1991. 

[TUXED091] UNIX System Laboratories, "TUXEDO® ETP System Release 4.2 Product 
Overview", 1991. 

[XOMDL91] X/Open, "Distributed Transaction Processing: Reference Model", X/Open 
Company Ltd., XO/GUEDE/91/020. 

[XOXA91] X/Open, "Distributed Transaction Processing: The XA Interface", X/Open 
Company Ltd., XO/CAE/91/300. 


13 
















NETWORK BACKUP AND ARCHIVAL STRATEGIES 
Paul Templeman, Sequel Technology 


ABSTRACT 

"Oh, I'll just knock up a couple of scripts to do that!" Probably one of the most widely 
used phrases in today's Unix world. Scripts control a vast array of day-to-day Unix 
administrative tasks, doing everything from cleaning up log files automatically, to 
shutting down the system. 

One of the main tasks for which scripts have been used in the past are backup, restore 
and archive functions. This is fine when you have the scenario of one server with a 
locally attached tape drive, where the drive has a larger capacity than the system's fixed 
disk capacity. But how does an administrator handle things when the scenario gets 
more complex; backing up over a network, using remote devices, scheduling automatic 
backups, restoration by users, file revisions, and backing up of large disk farms. 

This paper looks at backup and archiving strategies that can be employed by system 
administrators, and how an administrator can assess the strategy that best suits their 
own environment. Available utilities, both standard UNIX ones, and third-party 
utilities, are covered. Hardware options, such as tape, optical, and jukebox facilities are 
also reviewed. 

A number of sample scenarios are analysed, to see what combination of software and 
hardware systems combine to give the best fit'. This will include looking at some of the 
problems faced by administrators on a day-to-day basis, including; backing up of live 
and raw RDBMS filesystems, tape management, and performance issues. 

Finally, the paper looks at future direction of backup and archive solutions, including 
looking at such ideas as the 'Epoch' strategy. 

This paper aims to highlight the importance of thinking seriously about defining an 
organisation's backup and archival strategy, and what tool can be employed to ensure 
data integrity of systems. 


Paul Templeman 
Technical Director 
Sequel Technology 


© Sequel Technology, 1993 


Network Backup & Archival Strategies 








SOFTWARE - STANDARD UNIX UTILITIES 


Until recent times, UNIX tools have mostly been command line based. UNIX data 
management tools are no exception. With all the talk of client/server, GUI, LAN's and 
WAN’s, in today's UNIX environment the predominant data management tools haven't 
changed very much over recent time. UNIX systems administrators still mainly use 
dump and backup, cpio, tar or dd to backup their systems. 

A major part of a system administrator’s job involves creating shell scripts that utilise 
one of the standard backup utilities, to backup and restore data of the systems being 
administered. By combining additional commands, such as rsh and find, with one of the 
standard backup utilities, administrators can create complex routines to meet the 
majority of an organisations data management requirements. 

There is the question however whether these tools, although very powerful, are 
suitable as a standalone backup and archive solution in today's complex networking 
environments. Most of these standard tools are over fifteen to twenty years old, and 
today's complex environments did not exist and were not considered in the design of 
the tools. 


SOFTWARE - THIRD PARTY TOOLS 

Like all tools that have been around for some time, someone always feels they can 
build a better mousetrap. Data management tools are no exception, with a myriad of 
software companies producing replacement and add on utilities. 

These utilities either are designed to conform to existing standard UNIX utilities and 
interfaces, or have elected to provide replacement utilities that use proprietary 
technology to provide increased functionality and performance to users. 

Some common reasons for administrators to choose third party applications are; 

- data verification 

- data compression 

- performance 

- file history 

- network support 

- advanced hardware support 

- tape management functions 

- graphical user interface 

- security 

- live file system backup 

- raw and RDBMS file system backup 

- archiving functions 


© Sequel Technology, 1993 


Network Backup & Archival Strategies 




Some backup applications support only a subset of these features, with the standard 
situation of the more features, the greater the price tag. It is therefore important that 
prior to working out which backup application to use, you need to design a backup 
strategy for your organisation. Once you have a backup strategy completed you can 
make an informed decision about which backup product is most suitable. For example, 
if you have a heterogenous UNIX environment you don't want a product that will 
support only a subset of your equipment. If you have a single server with character 
based terminals, a product that was designed around a network environment may not 
be appropriate. 

Some of the more common third party backup solutions available are; 

- Budtool from Delta Microsystems Inc. 

- Networker from Legato Systems Inc. 

- C Tar from Microlite Corporation 

- BRU from Enhanced Software Technologies Inc. 

- Backup.unet from Raxco Inc. 

- Arcserve/open from Cheyenne Software Inc. 

Budtool, BRU, and Backup.unet have been designed around the open systems 
concept, and act as wrappers around standard UNIX utilities, such as dump, cpio, tar 
and dd. CTar is a replacement for UNIX's standard tar utility, and is suited to a single 
server environment where network and archive capabilities are not required. 
Networker and Arcserve/open are two network based products that have introduced 
proprietary backup solutions for the UNIX environment. Interestingly enough, 
Arcserve/open has originated from the NOVELL network arena, and has moved from 
that environment to encompass the UNIX market. All have situations where they fit 
well and are appropriate for, but it is essential you know what your requirements are 
before you decide which products to evaluate. Spending some time looking at the 
issues up-front will stop a lot of heart ache later on. 


HARDWARE 

Hardware is the other side of the backup coin, and again you need to be aware of what 
your requirements are before you can choose what hardware environment you need. 

There are three main technologies currently being used for backup and archive 
solutions; 

- Tape Drives 

- Optical Drives 

- Jukeboxes (Robotic Media Handling Systems) 


© Sequel Technology, 1993 


Network Backup & Archival Strategies 




HARDWARE - TAPE DRIVES 


There are a number of tape drives being used in today's computing environment, the 
main drive types being used are; 

- 1/4" streaming tape drives 

- 1/2" reel - to - reel tape drives 

- DAT (4mm) tape drives 

- Exabyte (8mm) tape drives 

The 1/4" and 1/2" tape drives have been around for a long time and are common drives 
to be used for backing up small servers, or individual workstations. Tapes tend to be 
low in capacity and performance. 

DAT (Digital Audio Tape) and Exabyte tape drives are now the most commonly used 
tape drives for large capacity backup solutions. Both drives backup between 1.3GB to 
5GB of data per tape, are high performance and have random search facilities. 

Traditionally DAT has tended to be somewhat slower in performance to the Exabyte 
units, and until recently only had 1.3GB and 2 GB tape sizes, whereas Exabyte used 
2.5GB and 5GB drives. DAT drives used compression to get up to lOGB's of storage 
per tape, until recently, when a 5GB DAT model was introduced. It is still worth 
checking, though, whether the capacity of the drive a salesman is quoting you is before 
or after compression. 

Both DAT and Exabyte tape drives have advanced features that also need to be 
supported by custom device drivers and application software. Such features as high 
speed search and Table of Contents are two examples of hardware features that need 
support at the device driver and application level. It is not possible for tar, for example, 
to do a high-speed restore without additional driver and application support, as the 
physical tape block number that the file(s) starts on is needed and this can only be 
recorded in some sort of file history database that was generated when the backup was 
created. 


HARDWARE - OPTICAL DRIVES 

Optical storage, when launched, was supposedly going to be the backup media of the 
future. It has taken off to a slow start however. There are two types of technology for 
optical storage, WORM (Write Once Read Many) and Erasable Optical. Although a 
lot faster than tape, optical storage is still relatively slow (around 35ms access time) 
compared with magnetic drives. Capacity is also relatively low, around 500MB per 
side of optical disk, compared with 5GB on a 8mm tape, and up to 20GB with 
compression. Optical media is also relatively expensive compared with tape storage. 
As a side issue, because of the low capacity issue, jukebox hardware is often required 
which is still pricey for the handling of optical platters. 


© Sequel Technology, 1993 


Network Backup & Archival Strategies 



HARDWARE - JUKEBOXES (ROBOTIC MEDIA HANDLING SYSTEMS) 


The latest in backup hardware is optical and tape robotic media handling systems. The 
units come in two flavours, either a stacker unit or a jukebox unit. The main difference 
between the two units is that a stacker unit is purely sequential, it loads the first tape, 
then the second, and so on. The jukebox on the other hand, utilises intelligent robotic 
media handling facilities, and combined with appropriate software drivers allows 
individual tape recognition and selection. 

Which hardware option you choose depends on what you are using the device for. If 
you are simply doing backups, then a stacker is probably most appropriate. If however, 
archiving or fast file restoration is a priority then a jukebox unit will be more 
appropriate. A jukebox does need greater software support to provide this increased 
functionality, however. This support is being supplied by a number of third party 
software companies. 


BACKUP VS ARCHIVING 

This is probably one of the important issues when determining your backup strategy 
and accessing what products may suit your needs. 

A backup, networked or otherwise, is simply a method of data security and only needs 
to be restored on an infrequent basis. A backup may consist of one or multiple tapes, 
and tapes may be grouped and rotated. If this is all that you require then neither optical 
drives nor intelligent robotic jukeboxes are usually required. A single tape drive or tape 
stacker is probably the most appropriate depending on capacity required. Advanced 
features such as rapid file retrieve, tape append, file history, graphical user interfaces 
and user backup/restore functions may not be necessary. 

Archiving, on the other hand, is the removal of files from hard disk to offline storage to 
free up disk space, but allows selection of individual files or groups of files and rapid 
restoration. Archiving requires greater hardware and software support, and therefore 
puts the solution into the higher price bracket. Optical drives and intelligent tape 
jukeboxes with high speed search facilities are the most appropriate hardware 
components for archiving, depending on how quickly you need files restored. With 
Optical system retrieval can be done in a couple of seconds, whereas a tape jukebox 
equipped with high speed search will take about three to four minutes to restore a file. 
The optical storage method is also much more expensive to implement, as is to be 
expected, due to the lower capacity compared with tape, and the increased complexity 
of the optical platter handling mechanisms. 

Often an organisation's needs require a combination of backup and archiving facilities, 
in which case an intelligent tape jukebox solution is probably most appropriate. 


© Sequel Technology, 1993 


Network Backup & Archival Strategies 



Regardless, it is important for you to work out which solution is appropriate, backup 
or archiving, or a combination. It is important to note that consultation with end users 
is often important regarding this, as a need for archiving facilities may not be intuitively 
obvious. For example a project group may be generating large data analysis files, 
which are currently being stored on disk "in case" they are needed later, and because of 
the long time span required to regenerate the results it is not feasible to be wiped off 
the disk. This is a classic case for the use of archiving, but unless the user tells IT what 
they are doing, the need may not be apparent. 


COMMON BACKUP AND ARCHIVING ISSUES 

UNIX based servers and workstations are becoming more and more popular. UNIX 
networks are appearing all over the place, and no longer are UNIX systems put into 
the academic and technical pigeonhole. Whether we like it or not, UNIX systems are 
now big business and big dollars. UNIX systems power is ever increasing, with ever 
larger disk capacity, and ever more complex applications. 

On the backup front, administrators are still using the standard UNIX backup facilities 
such as dump, cpio, tar and dd. These utilities haven't changed much since they were 
introduced to the UNIX environment fifteen to twenty years ago. Although still very 
powerful, they are not on their own sufficient for corporate and government backup 
and archiving requirements. In today's world of graphical user interfaces, a User simply 
won't wear having to enter 'cpio -icvbdum "project 1*" < /dev/rstO' to restore a file 
from tape, after he or she has organised for the operators to insert the correct tape into 
the drive. This is just not acceptable to a user, who quite often, has no concept of the 
underlying operating system or network. He or she may only know how to use the 
application that is being run. 

Another common problem in today's commercial UNIX environment is the increasing 
disk capacity being used, and the decreasing time available to perform system backups 
on a daily basis, ie. the backup 'window' is becoming smaller. Thus we are seeing the 
situation, where administrators are required to backup larger amounts of data in less 
time. Even with the newer backup hardware, with increased performance, it is often 
not possible to backup the required data in the available time, with a hardware only 
solution. 

Another major issue confronting the modem UNIX administrator is the problem of 
variations of command syntax and data formats. SunOS uses dump as it's main backup 
method. IBM's AIX uses backup, and has no dump command as under SunOS. AIX 
and Intergraph's version of System V both have cpio, but Intergraph uses the -local 
switch on the find command to backup local file systems, whereas AIX's find command 
uses the -fstype switch to do this, and so the list goes on. Heterogeneous networking is 
not always as easy as vendors would like us to believe. 


© Sequel Technology, 1993 


Network Backup & Archival Strategies 




In some organisations even combining hardware and software solutions still doesn't 
allow them to backup their system in the time available in the backup window. This 
may be due to the large amount of data, or that the system is a 24-hour on-line system. 
The only solution then is to backup a 'live' filesystem. Normally a UNIX system is best 
backed up in single user mode, but mostly this is inconvenient, and in a network 
environment is often almost impossible. To backup a live filesystem requires the use of 
flock() and lock(), and to be effective needs kernel modification to do this. Sun's 
Backup Co-pilot is an example of such a kernel modification. 

Backing up live filesystems is often the only option for large disk farms. As an 
example, consider the following environment; 

- networked UNIX workstations 

- a 36GB Disk farm' 

- weekly full backups with Daily incremental 
(where an incremental = 20% of a full backup) 

- full backups are staggered 

- moderately active filesystems 

- a 250 to 300 kb/sec sustained transfer rate per backup 

This example would result in 1 lGB's being backed up each night, taking ten hours 
using a single backup stream, or five hours with a dual backup stream. Thirty six GB's 
seems a lot of disk, and it is, but we are already seeing 1 3 GB drives being shipped as 
standard with some workstations, so 36Gb isn't so large after all. If all this seems so 
so, consider the dilemma of needing to backup a 100+GB disk farm in a 8 hour backup 
window. 

If you are in a network environment, you also have the added problem of network 
saturation. The previously mentioned example on an ethemet backbone would mean 
near total bandwidth saturation. This is fine if the backup is being done at night and 
nothing else wants to use the network, but on a cluttered corporate-wide backbone, 
transferring 1 lGB's over a network to be backed up can cause extensive network 
traffic problems. 

Another common problem encountered by administrators is the backup requirements 
of RDBMS partitions, ie. raw partitions. Often, administrators export the database 
information to log files, and then simply to backup the log files as part of the normal 
backup procedure. It does have a downside, you need to keep sufficient disk space 
available for the logs, and it also requires processing time to do this. The other option 
is to backup the raw partition which means the database can't be in use, or that special 
backup utilities for the RDBMS vendor need to be used. If possible your backup 
software should allow for the use of these special utilities. 


© Sequel Technology, 1993 


Network Backup & Archival Strategies 




A CASE STUDY - BUDTOOL FROM DELTA MICROSYSTEMS INC. 


To see how some of the third party software vendors have tackled the problem of data 
management let's look at Budtool, a product from Delta Microsystems Inc. in the US. 

Budtool acts as a wrapper around the collection of UNIX utilities, such as dump, tar, 
cpio and dd. The application has been designed to adhere to the major industry 
standards and utilises an X Windows based graphical user interface. Budtool is 
available for Sun, IBM RS/6000 and HP 9000 series systems, and can backup/archive 
any system that supports rsh or NFS. Budtool, as it is based on XI1, can be displayed 
on any workstation or terminal that supports XI1, Motif or Open Look. 

There are three main concepts in Budtool; 

- a backup server 

- a media server 

- a backup client 

A backup server is the system where Budtool is run. Normally this is where all the 
volume, media, file history and request databases are kept, along with miscellaneous 
configuration files. 

A media server is a machine that has tape units physically attached to it, and is 
responsible for the physical tape or optical hardware, jukebox or otherwise. This is 
accomplished through custom device drivers supplied by Delta Microsystems for a 
range of supported hardware. Multiple Media Servers can be controlled by a single 
Backup Server, and a Backup Server can also act as a Media Server. 

A Backup (or Restore) Client is a machine on the network that is required to have it's 
data backed up or archived. Budtool requires no special software to be installed on a 
Client, with the only requirement being that the Client system has one of the standard 
backup utilities, and supports rsh or NFS. 

Budtool allows the setting of backup or archive 'requests' which can specify what is to 
be backed up, using what method e g. dump, and which Media Server should be used. 
These requests can then be grouped together and referred to as a single entity. 

Backup requests/ backup groups can be run manually or can be scheduled. Budtool 
includes a full calendar based scheduler to allow backups to be scheduled for up to a 
year in advance, and allows for the inclusion of Public Holidays (something which a 
standard cron entry cannot do). 


© Sequel Technology, 1993 


Network Backup & Archival Strategies 



Budtool by default supports three backup utilities or 'classes'; dump, cpio, and tar. This 
includes support, via additional utilities, for file history, rapid file search, and true 
append to tape facilities for each of the three standard 'classes'. Other backup 'classes' 
can be added by the administrator to allow for support of additional backup utilities 
such as dd and backup. In addition, each 'class' allows pre and post processing, e.g. to 
allow the use of RDBMS utilities such as Oracle's exp utility. This concept of backup 
'classes' that supports any utility that uses standard input and output allows a great deal 
of flexibility and customisation. 

On the user side of things, due to the use of XI1, users can be provided with a GUI on 
their own workstation that allows them to backup and restore files of choice, as long 
as he or she has the correct security permissions. 

For every backup run by Budtool, a file history database is created, and when 
combined with the volume database, not only allows users to select a file and have 
Budtool automatically retrieve the file, but also allows the user to select which revision 
of the file to restore. Budtool allows the creation of database size rules to 
automatically control the size and amount of disk space taken up by file history and 
volume databases. 

Budtool also incorporates media management, and utilising Budtool's label expression 
utility, allows for automatic recycling, periodic duplication for off-site storage, 
automatic tape retirement, and volume expiration dates. 

For live filesystem backups, Budtool is compatible with such products as Sun's Backup 
Co-Pilot. 

Budtool supports a number of output devices, including; DAT and Exabyte tape units, 
optical drives, UNIX filesystems, tape stackers, and optical and tape intelligent 
jukeboxes. 


© Sequel Technology, 1993 


Network Backup & Archival Strategies 


A CASE STUDY - EPOCH FROM EPOCH SYSTEMS INC. 


Epoch is an innovative move into the arena of automatic network backup and file 
migration that provides one possible solution to the problem of backing up large disk 
farms now and in the future. 

Epoch consists of an Epoch-2 data server, Epochmigration, and Epochbackup. The 
Epoch-2 data server consists of a Sun SPARCstation 2 with an array of disk drives, 
optical and tape jukeboxes. Epoch-2 appears as a standard NFS server on your 
network. 

The basic premise of Epoch is that users can NFS mount the Epoch-2 data server from 
their workstation, and use it as a very large normal UNIX filesystem. Behind the 
scenes it's a different story. Epoch is constantly looking at file usage, and as a first 
stage moves least used files from disk to optical, and then at a later stage from optical 
to tape. 

Epoch also includes Epochbackup. Epochbackup backups any workstation on the 
network that allows it's drives to be NFS mounted by the Epoch-2 data server. 
Templates are provided for backup scheduling, and should a workstation be 
inaccessible when a backup is attempted, Epochbackup will automatically reschedule 
the backup for a later time. 

Another tool from Epoch is Epochmigration. Epochmigration runs on Sun and DEC 
workstations and acts as another tier in the migration path. Files are automatically 
moved from the local disk to the Epoch-2 disks, and then to optical and tape as 
appropriate. 

Epoch heads in the right direction in terms of future backup and archiving 
requirements. However, with only two sites in Australia, and a base price of AUS$ 
125,000, Epoch is still not for the faint hearted. But as the product matures. Epoch 
will be a major player in the backup and archive race. 


© Sequel Technology, 1993 


Network Backup & Archival Strategies 




CONCLUSION 


Standard UNIX utilities such as dump, cpio, and tar, have been with us a long time, 
and although powerful when wrapped into a shell script, they cannot address some of 
the issues that are affecting organisations data management. We need to look at third- 
party applications which still utilise the standard UNIX utilities, but provide the 
advanced network and advanced hardware support that is required if we are to keep up 
with the ever increasing data sets that need to be managed. 

Organisations need to be aware of data management issues and look at their own 
internal requirements, and those of it's users. By completing a backup strategy, 
organisations will be much more aware of what their backup and archiving 
requirements are, and what third party products they should review. 

Users are no longer prepared to put up with cryptic command lines and applications 
are now required to be intuitive, and preferably be graphically based. Minimisation of 
operator intervention can now be achieved with advanced optical and tape jukebox 
hardware, which is supported by a number of third party applications via custom 
device drivers and utilities. 

Organisation's disk farms are becoming increasing large. More and more information is 
required to be backed up, in ever decreasing amounts of time. Network saturation is 
increasingly becoming more of a problem, as is management of data from large 
RDBMS installations. With the newer generations of workstations and servers 
becoming available with larger and larger disk capacities, new data management 
technology will be required, combining software with multiple hardware formats to 
provide an adequate solution. 


Paul Templeman is Technical Director of Perth based Sequel Technology. Sequel 
Technology provides open systems solutions to the Corporate and Government 
marketplace, and is the exclusive agent for Budtool in Western Australia. Paul can be 
contacted on (09) 417 5713. 


© Sequel Technology, 1993 


Network Backup & Archival Strategies 




Distributed Object Management 

HARALD REISS 



69 Le Souef Drive 
Kardinya, W.A. 6163 
AUSTRALIA 


lin@rdeinLDIALix.oz.au 


ABSTRACT 

In this paper we would like to highlight some technical aspects of object management in a 
distributed environment, its implementation and its influence in software development. 

Object oriented design is seen as the most promising way to get better software and to reuse 
its components. The management and storage of networks of objects and methods are a new 
development in computer science. Thus a consortium called Object Management Group 
(OMG) is formed, which has members all over the world. 

The Object Management Group defines an architecture to transparently manage objects in 
heterogeneous distributed environments. 

An object system as a collection of objects provides services to clients by a well-defined 
interface. The OMG (concrete) object model specifies the structure of this interface both to 
client requests and the object implementation. An Interface Definition Language (IDL) 
allows to completely describe the object interface with all its methods. 

The Object Request Broker (ORB) provides the mechanism to locate the appropriate object 
implementation and sets up the communication path for client requests. The ORB itself is 
defined by its interfaces. An implementation can be directly linked into the client and object 
code, run as a server or be embedded into the operating system. 

The architecture is also open to integrate different types of implementations of object 
systems such as object libraries or object oriented database systems (OODB). 


Computer Networking has matured during the last years. We have got facilities to do efficient and 
standardised communication between heterogeneous systems. 

The socket transport interface appeared in 4.2 BSD first. This simple but powerful network interface 
opened the software world to a whole class of new applications independent of the underlaying physical 
network today. Sockets have been superseded by the X/Open Transport Interface (XTI). 

Different hardware architectures are using incompatible data type representations. The presentation layer 
service External Data Representation (ONC-XDR) solved this problem by definition of a common 
network data type representation. Remote Procedure Calls (ONC-RPC) prepare the easy way into client 
server applications. One of the most popular packages is the Network File System NFS. 

RPC/XDR is bundled with a definition language to specify data types and procedure calls at a higher level. 
A protocol compiler (rpegen) creates the appropriate C code including all stubs required for the client and 
the server process will be created. 

Object oriented design, programming and development is nothing new. SmallTalk as a pure object 
oriented language has been developed by Xerox PARC since the early 1970s and is available since 1980. 
Today the preferred language for object oriented programming is C++. Many C users see C++ as an 





Distributed Object Management 


A 


A 


enhanced C or just the next revision. This is an easy step into the object oriented world, even when C++ is 
considered more a class oriented than a true object oriented language. 



Modem "object oriented" network 
concepts have changed the view of a 
network. It is no longer seen as a 
number of computers providing 
different services, but a set of objects. 

Today it is generally impossible to have 
direct access to another applications 
data. Systems are open up to the 
networking level only. Most of the 
applications are still closed to their 
own environment. 

None of the RPC derivates does give 
any support for this level. New 
mechanisms are required to open 
distributed applications and to allow 
interaction between applications. 
Several companies have recognised this 
problem. 



The Object Management Group (OMG) was formed in April 1989 by a group of vendors and user 
organisations as a non profit organisation. The organisation "is dedicated to maximising the portability, 
reusability and interoperability of software. The OMG is the leading world-wide organisation dedicated 
to producing a framework and specifications for commercially available object environments." 

The OMG has more than 300 members today including names like Sun Microsystems, Digital Equipment 
Coiporation, Hewlett-Packard, OSF, Microsoft. OMG itself employs very few people. Most of the work is 
carried out by members. 


®MA 




\ Application Objects < \ Common Facilities 


The first major result of the OMG has been the Object Management Architecture Guide (OMA). It 
describes an abstract object model, an architecture to implement distributed applications. The object 
management architecture is structured into four components: 

• Object Request Broker 

• Object Services Object Management Architecture 

• Common Facilities 

• Application Objects 

A _Ills_Object_ Resource 

Broker 

Communication between 
objects is done by requests and 
results. The application object 
is able to use services of the 
Common Facilities by sending 
a request to these objects. 

Requests are independent of 
the executing machine. The 



■ Page 2 - 





















Distributed Object Management 


A 


A 


underlaying network and machine architecture are fully transparent to the application. This is guaranteed 
by the Object Request Broker (ORB), the central component of this architecture. Services provided by the 
ORB are 

• the Name Service to identify an object within the network, 

• the Request Dispatch Service to decide which method of an object has to be called, 

• the Parameter Encoding to have a unique data representation over the network, 

• the Delivery Services to allow correct transport of requests and responses, 

• the Activation to implement persistent objects, 

• the Exception Handling to deal with error conditions and other unexpected situations, 

• and the Security Services to identify communication partners. 

♦ Object Services 

Object Services provide basic functions to manage objects within a network environment. These are 
functions to create and manage classes and instances and to implement persistent objects. On a local level 
this type of functionality is provided partially by object oriented programming languages. 

♦ Common Facilities 

Common Facilities provide all functions of general characteristics required by many applications. They 
can be seen as a common class library. The application developer can use them to create his own classes 
tailored to the specific requirements. 

♦ Application Objects 

Application Objects summarises all types of applications based upon the OMA providing or using services 
of the system. Some types of applications are 

• CAD applications, 

• CASE systems, 

• network management applications. 

OSF DME (Distributed Management Environment) and USL Atlas are concepts based on OMA. 


Common Object Resource Broker Architecture 

Two groups with HP, NCR, Object Design, Sun in one and DEC, HyperDesk in the other developed their 
own definitions of an ORB like implementation. Within the Object Request Broker Task Force they 
agreed to a common strategy in 1991, the Common Object Resource Broker Architecture (CORBA). It is a 

Object Request Broker 



— Page 3 — 







Distributed Object Management 


A 


A 


concrete object model based on the Object Management Architecture. The document describes a binding 
definition of an ORB. 

Two different types of interfaces are provided for the client. Both types of interface have the same 
capabilities. 

The Dynamic Interface provides an interface repository to retrieve an interface definition at run time to 
call a method of an object. No static calls have to be created. The caller does not even need to know where 
the object is located (whereas ONC-RPC users have to know where the server is located!). 

The Stub Interface is usually created by an Interface Description Language (IDL). The IDL compiler 
creates the interface for both the client and server side. This mechanism is very similar to the rpcgen 
program creating RPC stubs and header files. 

♦ -IPL 

The object interface is specified as a formal description of a calling structure. The IDL syntax is very close 
to C++. A few extensions have been made to describe an interface in a distributed environment. IDL 
distinguishes between input, output and input/output parameters. An IDL interface definition is similar to 
a C++ class. Specifications of one interface can be inherited from other interfaces. 

A compiler translates the definition into the target language. The C language mapping is specified in the 
current CORBA document. It shows how to translate each IDL construct into C language. 

A public domain IDL compiler is available. It creates an intermediate language. 

t-Object Adapter 

A generated server interface is used to call object implementations from the ORB. The object itself is able 
to request services from the ORB via the Object Adapter. The Object Adapter is the real interface to the 
ORB for an object implementation. 

Object adapters have the following responsibilities: 

• generation and interpretation of object references 

• method invocation 

• security of interactions 

• object and implementation activation and deactivation 

• mapping object references to the corresponding object implementations 

• registration of implementations 

Services not provided by a specific ORB core implementation have to be implemented in the object 
adapter to guarantee the same interface functionality to the object implementation. 

Any special requirement needed by an object implementation have to be implemented in the object adapter 
(for example registration of multiple objects of an 00 database in one call). This allows also performance 
tuning. 

Examples of object adapters are: 

• Basic Object Adapter 

This adapter type can be used for most object types. 

• Library Object Adapter 

It is used for library objects linked to the client’s program. 

• Object-Oriented Database Adapter 
A special adapter tuned for OODBs. 

♦ Object Resource Broker 

The CORBA document allows different implementation strategies for the ORB. Objects have very 
different requirements how to interface the ORB depending on the object model used. 


— Page 4 — 









The ORB is defined by its interface only. It can be implemented as a single component, but there is no 
requirement for it. 

Some examples how to implement an ORB are: 

• Client- and implementation-resident ORB 

The ORB code is linked to the client. Interprocess communication (IPC) is used to access the object 
implementation. 

• Server-based ORB 

The ORB is implemented as one or more server programs. IPC is used to communicate with clients 
and object implementations. The ORB is responsible for routing of requests between client and 
object implementation. 

• System-based ORB 

The ORB is integrated into the operating system as a part of it. 

• Library-based ORB 

"Light-weight” objects can be implemented as a library and linked into the client code. In this case 
the stubs are the object implementation itself. The object implementation has to trust the client not 
to damage its data as they share the same address space. 

A client is not necessary restricted to access one ORB only. If different ORBs are able to work together, 
the client may use them simultaneously. Different styles of object references have to be resolved by the 
ORB. 

The interface is organised into three categories: 

• operations that are the same for all ORB implementations 

• operations that are specific to particular types of objects 

• operations that are specific to particular styles of object implementations 

A request consists of 

• an operation (name), 

• a target object , 

• zero or more parameters (data for target object), 

• an optional request context with additional information about the request. 

An exception is returned, if any abnormal condition occurred during the execution of a request. A request 
may return a single result value. 

♦ C lient 

Objects are invoked by 
usage of the object 
reference, implemented as 
a language specific data 
type. The stubs convert 
this type, usually an 
opaque pointer, to the 
representation required by 
the ORB. 

Object references can be 
received as an output 
parameter from another 
object invocation or an 
input parameter if the 
client is an object 
implementation itself. A 
string representation of an 



— Page 5 - 




A- 


Distributed Object Management 


-A 


object reference can be used for permanent storage. 

♦ Object Implementation 

A set of data and code is required to implement the object itself. The implementation can use other object 
to implement its functionality. Some ways of implementation are 

• (00) libraries, 

• servers, 

• a program per method, 

• an encapsulated application, 

• an object oriented 
database. 

Encapsulating' of existing 
applications allow their 
integration into an object 
oriented environment and 
their usage together with 
newly developed OO 
applications. This approach 
will especially enable the 
further usage of the major 
mainframe applications 
within the object oriented 
world. 

♦ Basic Object Adapter 

A standard object interface 
is specified in CORBA, the 

Basic Object Adapter (BOA). It should be available in every ORB implementation. 

The main services provided by the BOA are 

• Generation and interpretation of object references 
Unique references are used to identify an object. 

• Identification and authentication of the calling client 





( 7 ) 

brxjKhX 

kTTp*wry»r#cj§or 


Object Implementation 

£ 


( 5 ) 

****** m 


▲ 


A 

nil i m iriiir 


I 


krpWrvo 

V 

aJ 

rrtafcxi Ot 

Basic 

/.V.V.V/.V.'.V/.V.V.'wX 

hvcfc* 

^ Mrftod 

Object Adapter 

Object Resource Broker Core 


Activation and deactivation of an object implementation 


- Page 6 -- 









































& - -Distributed Object Management - A 

The BOA has to implement persistent objects. This means permanent storage of objects (hard disk, 
etc.), write back of modified data. 

• Calls of methods by usage of generated stubs 
There are different ways to activate a method: 

• Each method is an own program 

• Each object with all its methods is implemented as a program 

• A program implements an object and all its instances 

The BOA interface is mainly specified in IDL notation. 

There are four different ways to activate an object implementation, also called activation policies : 

• Shared server policy 

The BOA starts the server process for a number of object implementations when the first request is 
performed. As soon as the process is ready, it will register to the BOA. 

• Unshared server policy 

The server process implements only one object 

• Server-per-method policy 

For each request, a new server is started. 

• Persistent server policy 

The server process is started outside the BOA. Otherwise it acts like a shared server. 

The way of communication between the BOA and the object implementation is not specified in CORBA. 
They are very dependant on the system and language environment used. The information required to 
connect to the object implementations is stored in the Implementation Repository. CORBA does not 
contain any further specification about the interface to it. 



♦ Multiple QRBs and Integration of Foreign Object Systems 

CORBA allows to have multiple Object Request Brokers. The same ORB (ORB1 in next figure) can be 
active on different machines using the same object references and communication methods. They act like 
a single ORB. Object references can be transferred between the different machines. 

One client can use objects from more than one ORB (ORB 1 and ORB2 in figure). Object references can be 
passed as parameter between the different ORBs. The ORB has to be able to distinguish between his own 
and others object references. 


- Page 7 — 





A 


Distributed Object Management 


A 


If two machines do not have any common ORB, a gateway is required to translate the object references 
and requests. 


Multiple ORB$ 



Today there are many "non CORBA" object .systems available. To integrate them into a CORBA 
environment, three different approaches are possible: 

• Mapping of foreign system into ORB objects (BOA objects) 

The foreign object system is seen as one or more objects interfaced by a basic object adapter. 

• Usage of special object adapter 

If the BOA is not suitable as an interface, a special object adapter can be used as interface. 

• ORB Gateway 

The foreign object system is seen as ORB. A gateway translates between a CORBA ORB and the 
foreign ORB. 

Different Ways to Integrate Foreign Object Systems 



- Page 8 — 










































Future Development RFI / RFP 

The CORBA 1.1 document has not covered a number of areas. To fill these gaps the OMG issues Requests 
For Information (RFI). The response is used to prepare a Request For Proposal (RFP). 

In January 1993 the OMG released the RFP C+ + Language mapping request for proposal. 

The C++ mapping will be the first object-oriented language mapping for the CORBA standard. 

Special requirements are 

• reliability to make sure incorrect IDL code does not corrupt the IDL language mapping, 

• performance , 

• portability to be independent of specific C++ implementation. 

The final specification is scheduled for end of 1993. 

A new CORBA 2.0 version is planned. Together with the C++ language mapping a RFT ORB 2.0 
Extensions has been released. It is due end of April. 1993. Responses will be reviewed by Technical 
Committee (TC) "Object Request Broker 2.0 Task Force" (ORB2TF). The next step will be a RFP, 
scheduled for July 1993. 

The addressed areas are: 

• Areas in CORBA deliberately left incomplete (repository APIs) 

• Additions to CORBA (additional language binding) 

• Extensions to core ORB (transaction, concurrency generation) 

Suggested response topics are: 

• Multi-media (images, animation, video, speech) as streams of data 

• Interoperability (high priority) 

Definition of interoperability 

Requirements and expectations for interoperability 

Expositions on general approaches to interoperability 

Discussion of specific mechanisms and protocols for interoperability 

• Other language bindings 

• Object adaptors 

• Repositories 

Interface repositories (current CORBA: extract operations only, no insert, modify) 

Implementations repositories (no operations specified currently) 

• Transactions 

• Asynchronous messaging 

• Multi-endpoint interaction and replication 

• Concurrency 

• Relationship to standards (for example ODP effort in ISO) 

• Compliance and testability 

Further topics for RFIs/RFPs are language bindings for ADA, COBOL and SmallTalk, interoperability 
between ORBs, asynchronous messaging. 


- Page 9 - 



Distributed Object Management 


A 


A 


References 

1. The Common Object Request Broker: Architecture and Specification. OMG Document 
Number 91.12.1, Revision 1.1. 

2. Object Management Architecture Guide, Revision 2.0, September 1, 1992. OMG TC 
Document 92.11.1. 

3. C++ Language Binding Request For Proposals, December, 1992. OMG TC Document 
92.12.11 

4. Object Request Broker 2.0 Request for Information, December 1992. OMG TC Document 
92.12.10 


-- Page 10 - 





Homebrew Network Monitoring: 

A Prelude to Network Management* 


Mike Schulze, George Benko and Craig Farrell 
Department of Computer Science 
Curtin University of Technology 
Perth, Western Australia 

mike@cs.curtin.edu.au, rete@cs.curtin.edu.au, craig@cs.curtin.edu.au 

March 12, 1993 


Abstract 

A wide variety of public domain and commercial tools 
exist to help network administrators manage networks. 
Few of these tools achieve a satisfactory price/performance 
ratio for organisations with small budgets. To date, we have 
implemented a number of tools which allow us to examine 
and visualise network communications with a simple, 
intuitive, X based graphical user interface. These tools 
acquire knowledge on network communications via 
passive network monitoring, with a minimal amount of 
user intervention. This allows users who do not have an in- 
depth understanding of network architectures to gain an 
immediate, intuitive understanding of their network and its 
performance. 

1 Introduction 

With the sudden growth of UNIX™ based networks, 
a need for effective network diagnostic tools has arisen. 
Indeed, many tools have been implemented to help solve 
this problem, but are beyond the financial reach of network 
managers within small organisations. 

Management of a small to medium sized network 
need not be an expensive, complex undertaking. Some of 
the functions provided by specialized network 
management packages can be implemented by a sub-set of 
publicly available tools. Of these tools, some are simple 
load monitors, others, such as etherfind[33] and 
Tcpdumpi 16] provide in-depth textual descriptions of local 
network traffic. These utilities may be sufficient for an 
experienced user, but can become cumbersome and/or 
uninformative in the hands of the uninitiated. 

In this project, we attempt to extend the goals of these 
utilities by visualising network data. This has been 
achieved by applying a graphical model to a collection of 
continuously updating network statistics. These statistics 
are gathered by promiscuously monitoring the local 


network. 

In the early stages of development, a number of tools 
emerged which collectively form the “Netman” project. 
The primary goal of the project is to provide an intuitive 
representation of network activity using graphical 
techniques. Several of the Netman tools have been inspired 
by non-text based network monitors like EtherView[ 14], 
NetVisualyzer[ 29], and traffic[36]. It is envisaged that 
Netman tools will supplement information provided by 
other monitoring tools rather than replace them. 

The development of this project was entirely under 
UNIX™: we have not considered other operating systems 1 . 
At present we have implemented our work on Sun 
SPARCstation™ and DECstation™ 5000 series 
architectures. These platforms will be referred to as 
monitoring stations in the remainder of this paper. We 
chose to implement our work on these workstations for 
reasons similar to those outlined in [24], 

All of our efforts are concentrated on monitoring 
Ethemet™[22] networks. A typical university ethemet, 
such as ours, provides us with a good environment in which 
to develop our project as it exhibits a wide variety of 
protocols. 

There are two methods of passive network 
monitoring: real time analysis and retrospective 
analysis[2]. Our initial work concentrated on real-time 
traffic analysis, which forms the bulk of this paper. To a 
lesser extent, we explore an implementation of 
retrospective analysis in the form of a protocol analyser. 


1. This is not to say that network traffic generated by hosts not 
running a UNIX derivative will be ignored. 

*. This is a preprint of a paper to be presented at the SANS 
II World Conference on System Administration, Network¬ 
ing, and Security, April 18-23,1993 Arlington, Virginia. 




2 Statistics via Passive Monitoring 

Passive network monitoring can provide a good 
insight to common network configuration problems. 
Examining the “wire” can often reveal problems such as 
protocol mismatches, incorrect routes, broadcast storms 
etc. We chose the passive approach because we were 
interested in monitoring normal network operations 
without interference from either a custom or established 
monitoring/management 2 protocol. That is, the act of 
observing a network should not directly interfere with or 
add to network activity. 

Having access to local network data allows us the 
freedom to provide a multitude of statistical information. In 
this way, we are limited only by the amount of network 
data we can process “on the fly”. A combination of 
specialised data structures and a reasonably reliable 
network tap forms the basis of the monitoring engine. 

2.1 Packet Capture Mechanisms 

User level access to physical network traffic requires 
an efficient mechanism which is able to interact with the 
monitoring station’s network interface device drivers. 
Some modem implementations of UNIX have such a 
mechanism, e.g. Sun’s Network Interface Tap (NTT)[34] in 
SunOS™, Snoop[ 30] in IRIX™, and the Ultrix™ 
Packetfilter[ 9]. All of these implementations are 
derivatives of the original “packet filter” developed at 
Camegie-Mellon University in 1980[25]. At present we 
use the Ultrix packetfilter and NTT under SunOS. Although 
there has been a recent paper[19] outlining the ineffiencies 
of these mechanisms, they are standard system resources 
and provide sufficient performance and functionality for 
our work. 

Data is captured by placing the network tap[19] into 
promiscuous mode and collecting statistics from the 
incoming packet headers. Information from each buffered 
queue, or “chunk” of packets, is decoded sequentially and 
distilled into the appropriate statistical data structure. No 
attempt has been made to make the packet decoding 
routines protocol independent; all protocol specific 
information is hard coded. 

2.2 Determining Logical Network Connections 

Finding connections between hosts forms the basis of 
our work. The method is simple: examine the source and 
destination fields of each passing packet. Once a source / 
destination address pair is obtained, it is stored in a 


2. We are not attempting to criticise SN\1P[4] in any way. The 
goal of the Netman project is to provide statistics on normal 
network operations as opposed to providing a management 
infrastructure. Although we realise SNMP MJBI1[20] partially 
provides the monitoring functionality, it is lacking in per pro¬ 
tocol based statistics. 


continuously updated list of connected nodes. Each node 
which is identified as a source of network traffic, will have 
an associated list of connections to destination nodes. Both 
lists are updated as often as possible, in our case after each 
chunk of packets is read. 

A node is never deleted from the list Therefore the list 
is limited only by the number of hosts on the network and 
the system resources of the monitoring station. An idle 3 
host is not removed from the list instead it is flagged as 
idle. This way, if the host were to re-transmit, the idle flag 
could be toggled and the previously determined host 
statistics would still be available. 

2.3 Data Flow Calculations 

In order to extract useful information from network 
traffic, we have outlined a set of statistical categories which 
we consider important: 

• per host source traffic flow 

• per host protocol summaries 

• per link traffic flow 

• per link protocol summaries 

• overall bandwidth consumption. 

These calculations must be made each time a chunk is 
processed. This means that we must maintain a list of 
running totals each time a packet is decoded and then 
perform the calculations. 

After each chunk is processed, a chunktime[ 9, 34] or 
time frame is established by finding the difference between 
the timestamps associated with the first and last packets in 
the chunk 4 . This value serves as a point of reference for all 
flow statistics generated from the running totals. 

Per host source traffic totals are tallied by examining 
the source and length fields of each packet, then 
incrementing byte and packet totals for the corresponding 
node. These totals are then divided by the chunk time to 
yield a result which is stored in a sliding buffer of finite 
length. From this brief history of traffic statistics we can 
obtain a “sliding average” which is used to represent source 
traffic for that node. Link statistics are handled in a similar 
way, as running totals for link flows are also maintained. 

Protocol summaries are more difficult: depending on 
the protocols considered most interesting to the user, each 
protocol layer must be decoded to the required level. For 
example, providing the frame type[8] of each Ethernet 
packet is not very useful if the network is predominantly 


3. The term “idle” is used loosely; the user must decide the 
parameters which class the host as idle. 

4. This value is slightly inaccurate because we ignore the time 
gap between consecutive packets separated by two chunks. 
Although we realise that this value is not correct[24], it is suf¬ 
ficient for our purposes. 





IP[28]. For this reason, we have introduced extra decoding 
routines that will provide summaries for UDP and TCP port 
connections. Although this involves more work for the 
monitoring station, we believe that the usefulness of this 
information outweighs the extra load. 

Producing these summaries requires a moderately 
complex tree structure which is associated with each link 5 . 
The root of the tree is a pointer to a list of Ethernet 
protocols being used on a link with its associated byte/ 
packet totals. Each frame type may have an associated sub- 
protocol, so in the case of IP, sub-protocols may include 
TCP, UDP etc. Following in this tradition, each sub¬ 
protocol may have a sub-sub-protocol, which we have 
termed the port type , solely for the TCP/IP suite. These 
statistics are running totals which are never converted to 
flow rates, but simply updated after each packet. 

2.4 Limitations of Passive Monitoring 

Restrictions to passive monitoring means the Netman 
tools can only access data on the local segment and other 
networks remain “invisible”, i.e. cannot be monitored. Sun 
has attempted to solve this problem by writing a simple rpc 
service, etherd[ 32], which allows a monitoring station to 
connect to a host on another network and retrieve traffic 
statistics from the remote network. Unfortunately, our 
requirements were more complex than that of calculating a 
bandwidth usage statistic every second. We considered 
writing our own customised rpc service. However, this 
would have contradicted our requirement for passive 
monitoring, as such a service would inevitably generate a 
significant amount of additional traffic. Silicon Graphics 
takes a similar approach with snoop. 

One of the limitations of passive monitoring is not 
being able to request information from the network. For 
example, SNMP will allow a management station to query 
a device (which is running an SNMP agent) to retrieve the 
requested MIB variables. All that can be obtained from a 
monitor are cumulative statistics concerning the active 
devices on the local network. This means that simple 
questions like, is my host up ? cannot be answered if that 
host is idle 6 . 

Another inherent limitation on monitoring is that we 
are unable to detect physical level devices such as 
repeaters. That is, a fault occurring on a repeater segment, 
will only be detected via inability to reach hosts on that 
segment. Of course a traditional method can be used, such 
as sending an ICMP echo and waiting for a reply. 


5. It is not the intention of this paper to provide a detailed 
account of the internal data structures we have implemented, 
so the discussion will be limited to a general overview of the 
statistical structures. Although it is useful to note each node 
has an associated list of links, and summaries for each node 
can be derived from this information. 

6. Although rwhod will make a host “visible” if broadcasts are 

being monitored. 


2.5 Effective Monitoring Strategies 

The use of bridges can restrict the amount data which 
a monitoring station can access. A local Ethernet may be 
viewed by some as a single length of cable rather than a 
conglomeration of segments and bridges, but the use of a 
bridge 7 to fragment a network may serve as a “firewall” to 
the monitoring station (see Figure 1). 


Segment 1 Segment 2 _ Segment 3 


. T . 

11 

. ,T . 

,11 



Bridge 

TT 

Bridge 

nr 

Monitoring Station 


Figure 1. Using bridges to segment an Ethernet. 


In this case the monitoring station will only see the 
traffic on or destined for segment 3. That is, even though 
the traffic on segments 1 and 2 may be considered local, the 
monitoring station can only see what is on the immediate 
segment Traffic from segments 1 and 2 will appear to 
come from the ethemet interface of the bridge connected to 
segment 3. 

Depending on the network activity that is considered 
most important, the monitoring station must be placed on a 
segment which is most likely to have the highest proportion 
of interesting traffic. For example, if remote IP traffic was 
of major concern, the monitoring station would be best 
placed on a centralised backbone which has a high 
exposure to foreign traffic. 

2.6 Assumptions made on Physical Transmissions 

Accessing data at the link level remains a problem, 
even if the workstation is capable of processing network 
data as fast as it arrives. The information being transmitted 
on the physical Ethemet will not always appear at the link 
level regardless of machine processing speed. The physical 
level hides collisions, checksums (valid or otherwise), 
corrupted frames etc. from the protocol level accessed. 

Calculated load average, which represents the actual 
level of network traffic, must be as close as possible to the 
correct value. This means that certain assumptions about 
the data received are required. For example, a received 
packet is assumed to have had an associated preamble, 
frame check sequence, and an inter-frame gap[l]. This 
assumption compensates for the fact that these pieces of 
“lost” information are left at the physical level at the time 
of transmission (see Figure 2). Some network monitors fail 
to take this into account, creating a slightly inaccurate 
calculation of bandwidth usage, e.g., xtr\ 27]. 


7. Most modem bridges or level-2 routers[38] will only route 
packets if the destination Ethemet address is located on the 
other side of the bridge relative to the source device. 








Physical Level 

Data Link Level 

7 1 

6 6 2 0-1500 046 

4 

12 

Preamble | | 

| dest | arc |t| Data |p«d 

FCS 

Imer-frame gap 





| dest | arc |t| Data 




1 iiya JUJU U <*113111130 AJ Li 


Figure 2. Frame lengths at different levels. 


3.1 Modelling Network Traffic 

Our approach to traffic visualisation is to create a 
graph of network activity. To provide an accurate model of 
the network, the nodes, depicted as circles, represent 
devices, and edges, shown as line segments, represent 
logical connections between network devices. This graph is 
connected 37], that is any pair of nodes are directly 
reachable from one another. 


There is no facility within the interface “tap” to detect 
Ethernet collisions, therefore there is no way of 
determining the amount of information transmitted during 
a collision sequence 8 9 . This is yet another contributor to the 
overall problem of not being able to represent network 
traffic accurately. 

Truncated or “runt” packets, that is packets with a 
length less than 60 bytes, should be treated as normal 
packets. This depends on whether the interface tap will 
pass the packets as valid transmissions. Since we have 
never encountered such a packet, we cannot confirm 
whether they are treated as a normal transmission. 
Misaligned packets have not been considered. 

2.7 Packet Filtering 

Packet filtering is the ability to focus on a specific 
protocol or a set of protocols, whilst the physical network 
interface is promiscuously capturing network data. Most 
network taps are able to filter packets at data link level and 
although Ultrix Packetfilter and NIT have this facility, we 
do not make use of it in our implementation. However, in 
order to avoid potential future problems when dealing with 
differences between filtering schemes across would-be 
platforms, and to provide access to all network traffic (used 
for calculating bandwidth statistics), we have implemented 
our own set of filters. 


Since typical networks have many devices 
sporadically but frequendy communicating with each 
other, a representation of the network quickly evolves into 
a sprawl of lines and circles unless there is some form of 
logical node ordering and labelling. We chose to arrange 
the graph in a circular manner, with all nodes equidistant 
(this idea is similar to the approach taken by NetVisualyzer 
and Etherview), Each node has a unique label which 
identifies it from other nodes. The positioning of the label 
follows a perpendicular to the tangent of the network 
“circle” at the node position (see Figure 3). 



Figure 3. Node labelling and distribution. 

A problem arises when we consider that each edge of 
the graph is undirected. That is, one host could send data 
destined for another, and the destination does not respond. 
Our model will create a connection between the two nodes 
but no indication of direction will result. Initially, we 
considered this a problem until we decided to identify the 
source and volume of traffic for each node. 


3 Visualising Network Data 

One of the goals of this project was to prQvide tools 
with graphical representations of network traffic that are 
easy to understand, yet complex enough to retain all of the 
relevant information. Although it is hard to say if any single 
graphical representation could be useful in all situations 
[24], we believe a depiction exists for network 
communications which involve source/destination pairs. 
The remainder of this section outlines a model which is 
general enough to display overall network communications 
in a given network. 


8. Unfortunately an assumption of 10 * 1024 * 1024 bits/sec 
was also made, which is not the correct speed of 10 7 bits/sec. 

9. Although there is a way of determining the number of colli¬ 
sions on the interface since the machine was booted, this infor¬ 
mation is not of much use if a time-stamped collision indicator 
cannot be obtained. 


Each node represents a network device which 
generates intermittent transmissions. The area of the node 
is a function of the source traffic generated by that device. 
That is, the source traffic for each device is directly 
proportional to the area of the circle, so devices generating 
a higher proportion of traffic will be more pro min ent. 

Since each device may have multiple connections, 
line thickness may be used to represent the amount of 
traffic on a link between two hosts. Other packages, such as 
EtherView , display traffic statistics between hosts by a 
colour coding scheme. We have found this difficult to 
understand when the display update time was less than a 
second, and more than a few colours were used. 
NetVisualyzer uses colour intensity to represent traffic flow 
on a link, that is, a single colour is used and only the 
intensity of the link is varied. We believe our approach is 
more intuitive; if the link gets wider, more data is being 
transmitted. 


















The final aspect of our model involves the 
identification of dominant protocols on each link. A 
protocol is considered dominant if it has the greatest byte 
aggregate from all protocol transmissions for any given 
link. A pre-determined colour table assigns each protocol 
with a configurable colour code. Once the dominant 
protocol has been determined, the colour table is consulted 
and link colour applied. 

So far we have assumed a network protocol that is 
bound to local communications only. If communications 
between hosts on differing networks are accessible 10 , an 
augmentation of our model can be used to group hosts into 
a logical ordering of networks. This makes it possible to 
represent communications within network layer protocols, 
such as IP or DECnet. 

We feel that the overall model just presented, is 
sufficient to provide a good picture of arbitrary network 
communications. We also feel that the model should not be 
expanded past its present form, for fear of introducing 
redundant complexities. For example, if arrows were used 
to imply direction of data flow on a given link, in most 
cases this information would be redundant as the majority 
of communications will involve a valid source / destination 
pair with bi-directional flow. 

The question of whether or not certain statistics are 
relevant on a crowded display will always cast doubt on our 
claim that the model is general enough to show all network 
connections. In practise, we have taken measures to avoid 
screen overcrowding by implementing host/network time¬ 
out and compression mechanisms. 

3.2 Animating the Model 

Providing instantaneous feedback of network 
activities is an essential part of real time fault diagnosis. 
The original implementation of our model was conceived 
on a Silicon Graphics 4D/20 workstation using the native 
g/[31] library. This platform provided the necessary 
graphical resources and the network tap we required to 
produce a fast working implementation. 

The gl library made it possible to develop a set of 
display routines which simply redraw the network, as often 
as possible. That is to, gather network statistics and redraw 
the network ad infinitum. This version was only capable of 
monitoring connections between network devices. 

Once we were satisfied that the code had matured 
enough to warrant porting, it was decided to implement the 
code using the XI1 Windowing system[23] to ensure 
graphical portability across platforms. The prospect of 
converting our carefully crafted display routines to 


10. For example, Monitoring IP communications can often 
reveal source / destination pairs that do not reside on the local 
network. 


Xlib[ 10] calls prompted us to search for an alternative 
method. After careful consideration we decided that we 
would use the vog/[13] library to integrate X with the gl 
calls we had grown fond of. Fortunately we had only made 
use of a few gl calls, all of which were supported by vogl. 

3.3 Display Implementation 

Bringing our ideas to life proved to be a balancing act 
between monitoring station resources and portability 
considerations. Our first implementation allowed much 
freedom when it came to animation, but lacked portability. 
Since we wanted to create a tool which contained our 
animated display as well as menus, push-buttons, graphs 
etc., we decided to explore the X Toolkit[2l ] to help solve 
our problem. Fortunately, we were able to implement a 
working version using X/, vogl and the hershey[Y2] font 
libraries 11 . Initially we chose the Athena Widget set from 
MTT[26] as it is publicly available and provided the 
necessary Widgets we required. Currendy, the Xaw3d[17] 
Widget set is used, purely for aesthetic reasons. 

The present implementations of our work provide us 
with a set of robust, informative monitoring tools. 
Surprisingly, the display is dynamic enough to show 
network traffic in real time, without any significant lag 
between burst and display. Unfortunately this is not the 
case when the X display and the monitoring station are 
several networks apart. 

4 Implementations 

A need for informative, easy to use network 
monitoring tools within the international UNIX community 
was becoming more apparent. This lead us to experiment 
with public domain and system resident tools which have 
helped us to achieve what we have. Most of tools that we 
found are text based, e.g. NNStat[ 3], Tcpdump % ether find, 
nfsu } atch[5 ] etc. and provided real-time statistics on 
network traffic. Although these tools have proved 
invaluable, we felt a graphical approach had far more to 
offer in terms of providing a user with a picture of all 
network traffic. 

4.1 Monitoring All Ethernet Traffic: etherman 

Our first implementation was directed at monitoring 
all ethemet host connections in real-time. Later, we added 
support for displaying the amount of source traffic from 
each node, and the amount of traffic on each host pair link. 
After testing etherman on different networks, it became 
obvious that we needed to be able to concentrate on 
“important” network activity by adding time-out and 
scaling functions. For example, a quiet network will appear 


11. The Hershey libraries were necessary because vogl does 
not support any form of text display. This also provided the 
font rotation which was available under gl. 



static or uneventful, but it is possible to scale nodes and 
links to allow a better picture of relative network 
communications. 

Network statistics are produced in both textual and 
graphical form. Dynamic statistics such as host and link 
traffic flow are displayed via our graphical model of 
network communications in real-time (see Figure 4). If a 
particular network phenomenon results during the course 
of execution, a postscript™ snapshot of the network is 
possible. Bandwidth consumption statistics are sampled 
once every second and displayed via a scrolling stripchart 
widget. 

A textual dump of protocol summaries collected for 
each host and link is available. The information given is for 
the run-time of the program. These summaries aie 
currently decoded for all Ethernet frame id’s and all IP 
protocols. The output format for each host pair indicates 
the amount of data exchanged in bytes and packets, and the 
protocol in use. 


We have used etherman to detect a variety of network 
problems and shortcomings. Because of its graphical 
nature, problems such as excessive bandwidth 
consumption and broadcast storms become easy to 
identify. It is also very useful for finding unexpected 
transmissions or unknown devices. 

4.2 Focusing on IP Connectivity: interman 

Using the augmented display model we can monitor a 
network level protocol, in this case IP, to display 
connectivity and dominant sub-protocols. In this way, 
networks are ordered in a circular manner much the same 
way as Ethernet devices in etherman . Unfortunately, this 
model will not display routing information as IP routing is 
transparent at this level. 

As with etherman , we monitor and display traffic in 
real-time with update times varying because of traffic flow 
and monitoring station capabilities. As can be seen from 
Figure 5, different networks are shown as circles with hosts 



Figure 4. etherman monitoring all local traffic 







































local to each network listed around their respective 
circumferences. Hosts are connected between two 
networks via a single link; the colour of the link denoting 
the dominant protocol. Hosts and networks appear on the 
diagram as communications are monitored, and disappear 
when a host or network has been idle for a configurable 
period of time. 

Figure 6 shows a sample summary of selected 
transmissions for host cujo on a per host, protocol 
breakdown. Since these summaries are derived from link 
statistics, we provide all communications involving the 
said host. This means that some information will become 
redundant when summaries are generated for each host 
involved in this example. 

Several operations that extend beyond the scope of the 
original work have been included. These include finger , 
telnet , ping , network compression, traceroute , a modified 
fping , basic SNMP information, and a screen refresh. These 
options seem to contradict the notion of passive monitoring 


because they create traffic, but in practice these sorts of 
tools become useful when it becomes necessary to probe 
for network or host information. Other miscellaneous 
options such as protocol summaries, filtering, postscript 
dumps, and varying node and link time-outs are available 
via menu options and scroll bar adjustments. 


Summary of Communications for host cujo.curtin.edu.au 
From cujo.curtin.edu.au to ubvmsb.cc.buffalo.edu 
TCP_SMTP pkts * 8 (0.56 K) 

From butler.acsu.buffalo.edu to cujo.curtin.edu.au 
OPD_DOMAIN pkts = 1 (0.08 K) 

From cujo.curtin.edu.au to uniwa.uwa.edu.au 
TCP_NNTP pkts = 489 (29.08 K) 

From uniwa.uwa.edu.au to cujo.curtin.edu.au 
TCP_NNTP pkts = 731 (349.04 K) 

From cujo.curtin.edu.au to wvnvaxa.wvnet.edu 
TCP_SMTP pkts =9 (1.34 K) 

UPD_DOMAIN pkts =1 (0.08 K) 

From cupid.curtin.edu.au to cujo.curtin.edu.au 
UPD_NTP pkts =4 (0.35 K) 

TCP_NNTP pkts = 12 (0.73 K) 

TCP_L0GIN pkts = 98 (5.74 K) 

Figure 6. Sample host protocol summary. 



Figure 5. inter man showing IP communications on the Curtin University campus backbone. 



























4.3 Packet Analysis; packetman 

Branching away from the notion of real-time traffic 
analysis, we decided to implement a retrospective packet 
analyser. In practise, packetman can be used to decipher 
packet trains which are buffered, and optionally stored for 
future reference. A protocol analyser has the advantages of 
being able to decompose headers and ex amin e protocol 
transactions in great depth. Our implementation, while far 
from complete, provides a good base from which we can 
work to provide comprehensive protocol analysis. 

The display is segmented into three windows each 
providing a different view of captured network data. The 
top window is a sequential trace of captured data. Note that 
a filter may be applied before commencing a trace to allow 
the user to focus on transactions of interest Each packet 
has an associated sequence number, timestamp, source/ 
destination pair, and a brief overview of the protocol 
contained within (see Figure 7). 


A textual description of all decodable protocol fields 
within the selected packet is displayed in the middle 
window. At present, decomposition for Ethernet frame 
types, selected BP/UDP and IP/TCP, ARP and ICMP 
protocols have been implemented. The lower window 
gives a simple hexadecimal and ASCII dump of the entire 
packet. 

5 Future Directions 

5.1 Enhancements to Existing Tools 

With the addition of extra data structures and protocol 
analysis routines, we have found that a reduction in 
performance has resulted. Fortunately, we have several 
options available to help improve overall display 
performance and statistical accuracy. 

A recent paper[19] has outlined a new packet 
capturing mechanism, BPF[18], that claims a performance 


S Packetman-' * Copyright (c) 1992 Curtin Uni vers it 


Quit( Filter| Capture | Save | Load} Options ( Clear] 


t time stamp 


len ere addr 


arc host -> dest host dest addr 



r- 








▲ 


39 21/09/92 

16:48:33.616764 

60 aa:00:04:00:eb:07 

narrows -> VXHC1 

00:00:b5:04:04:2d 

E the r/IP/TCP / te lne t 


▼ 


40 21/09/92 16:48:33.617275 

158 08:00:69:01:0b:c7 

pride -> covet 

08:00:69:01:0b:f2 

Ether/IP/UDP/nfs 




41 21/09/92 16:48:33.618446 

60 00:00:b5:04:04:2d 

VIMCl -> narrows 

aa:00:04:00:eb:07 

Ether/IP/TCP/telnet 


a 

L 

42 21/09/92 

16:48:33.622774 

170 08:00:69:01:0b:f2 

covet -> pride 

08:00:69:01:0b:c7 

Ether/IP/UDP/nfs 

l 

rr 


43 21/09/92 

1648:33.626983 

72 aa 00 04:00 eb:07 

narrows -> VIMCl 

00 00:b5:04:04:2d 

Ethe c /IP/TCP/telnet Hill 




44 21/09/92 

16:48.33.628720 

60 00:00:b5:04:04:2d 

VIMCl -> narrows 

aa:00:04:00:eb:07 

Ether/IP /TCP/telnet 




45 21/09/92 

16:48:33.629612 

154 08:00:69:01:0b:c7 

pride -> covet 

08:00:69:01:0b:f2 

Ether/IP/UDP/nfs 



1 46 21/09/92 

16:48:33.634248 

170 08:00:69:01:0b:f2 

covet -> pride 

08:00:69:01:0b:c7 

Ether/IP/UDP/nfs 


d 

114 bytes Ethernet Header 







6 bytes 

destination Ethernet address 

00:00:b5:04:04:2d 

VIMCl 



zl 


6 bytes 

source Ethernet address 

aa:00:04:00:eb:07 

narrows 





2 bytes 

type 


0x0800 

IP 




20 

bytes IP header 








4 bits 

version 


4 






4 bits 

header length (longwords) 

5 






1 byte 

type of service 


0x0 






2 bytes 

total length 


58 






2 bytes 

identification 


0x4367 






3 bits 

flags 


bits 000 






13 bits 

fragment offset 


0x0 






1 byte 

time to live 


59 






1 byte 

protocol 


0x6 

TCP 





2 bytes 

header checksum 


0xe880 






4 bytes 

source IP address 

134.7.70.1 

cc. curtin. edu. au 




4 bytes 

destination IP address 

134.7.1.199 

VIMCl.cs.curtin. edu. au 



20 

bytes TCP 

Header 








2 bytes 

source port 


23 

telnet 





2 bytes 

destination port 

4499 






4 bytes 

sequence number 


1220620057 






4 bytes 

acknowledgement number 

253825588 






4 bits 

header length (longwords) 

5 






6 bits 

reserved 


0 






6 bits 

flags 


bits 011000 






2 bytes 

window 


5615 






2 bytes 

checksum 


0xb8a6 






2 bytes 

urgent pointer 


0x0 





18 

bytes telnet packet 







— 

18 bytes 

data 


V To local file: ■ 




±1 

_ i 

0x000 - 0000 

00 00 b5 04 

04 2d aa 00 04 00 eb 07 

08 00 45 00 

...E. 


—4 

0x010 - 0016 

00 3a 43 67 

00 00 3b 06 e8 80 86 07 

46 01 86 07 

: Cg. . ;.r 



Jj 

0x020 - 0032 

01 c7 00 17 

11 93 48 cl 2f 19 Of 21 

12 34 50 18 

.H./. . I.4P 


$ 


0x030 - 0048 

IS ef b8 a6 

00 00 0a 20 20 54 6f 20 

6c 6f 63 61 




1 

0x040 - 0064 

6c 20 66 69 

6c 65 3a 20 

1 file: 


! 


Figure 7. Using packetman to examine a telnet packet. 








































increase of 10 to 150 times over NIT. Although 
implementing code to support the BPF is trivial, we have 
no way of testing the implementations as we do not have 
access to the required SunOS kernel source. Unfortunately, 
the port of for Ultrix and other common platforms is not 
currently available, but it is hoped that a mechanism will be 
incorporated at compile time to determine whether BPF 
support is available, otherwise the system standard network 
tap will be used. This will allow us to support as many 
platforms as possible in the same manner as Tcpdump. 

A major contributor to poor performance is the sheer 
volume of X protocol transmissions. Although this is 
hardly surprising within a dynamic graphical application 
such as ours, we can reduce the number of X server 
requests to a fraction of what they are now by improving 
font handling within the X server. Enhancements such as 
these[6] have already been made available in the public 
domain as extensions to the X11R5 font server. Although 
such extensions are currently available, they are not 
implemented on a broad cross-section of X capable 
workstations. We are hoping that extensions to the X 
protocol, or a similar mechanism, will be included as part 
of the XI1R6 distribution to provide this support. This will 
allow us to re-work our display routines and provide a 
reduced display update interval. 

Calculation of traffic flow statistics also has room for 
improvement. At present, our flow calculations are 
inaccurate due to inter-chunk gaps being ignored. We have 
decided to implement the augmented loadring 
algorithm[24] to provide us with the correct flow statistics. 

Packetman requires a great deal of work in terms of 
additional protocol modules. We will continue to expand 
the capabilities of the IP module as best we can, but cannot 
guarantee that other major protocols such as DECnet, 
Novell, EtherTalk etc. will be attempted 12 . 

5.2 SNMP Management Station 

Implementing a useful network management station 
using SNMP will be the most challenging aspect of the 
project. We have not decided how the display will work, as 
the question of the ideal method of auto-topology remains 
unanswered[40]. We would like to implement an 
RMON[39] capable management station that will access 
the active host information and connection matrices of 
RMON probes. To date we are still concentrating our 
efforts on passive monitoring, but hope to enter the 
management scenario in the near future. 

5.3 Representing Physical Networks 

A project was initiated early in 1992 to investigate the 
possibilities of physical network representations. It was 


12. If there is sufficient Interest, we will consider releasing the 
source for public review. 


envisaged that this tool would be used to document 
physical network schemas with a minimum of effort on the 
user’s part This tool would use icons extracted from 
familiar vendor logos, and be able to interconnect these 
icons with lines representing Ethernet cable and fibre optic 
cable. 

Although a very successful undergraduate project, in 
terms of X programming and network design experience, 
this tool requires a great deal more development in order to 
be usable. We hope to continue the development of this tool 
and also perhaps to, couple it with a monitoring engine to 
provide a crude auto-discovery capability. 

5.4 Geographical TVaceroute 

Most people are familiar with the internet route 
tracing facility, traceroute[ 15]. Some people may also be 
aware of the uumap project which attempts to identify sites 
globally. If these two information resources were coupled 
with the CIA World Bank ET[7] world map, it would be 
possible to produce a “geographical traceroute” which 
displays the geographical “point-to-point” routes taken to 
reach a remote site. 

The information provided by such a tool could be used 
as either a diagnostic tool or as a educational aid for those 
who have not grasped the basics of IP routing. Other 
possibilities such as overlaying national and international 
backbones and satellite links could provide a more in-depth 
view of Internet connectivity. This project is currently 
under development. 

6 Conclusion 

We have presented an overview of a research project 
currently underway at Curtin University. At present we 
have implemented some utilities which we feel provide the 
network administrator with an immediate and intuitive 
understanding of network utilisation. We have successfully 
used these tools to diagnose and correct a variety of faults 
on several networks. 

These tools were not designed to replace existing 
network management systems, rather to supplement them. 
We believe that by adding the information provided by 
these monitoring tools to the facilities provided by 
management tools like SunNet Manager[35 ] or HP Open 
View[ 11], the network administrator will receive a more 
complete and powerful management system. 

7 Availability 

Interman , etherman , and packetman are available via 
anonymous ftp from host ftp. cs.curtin.edu. au as a 
binary distribution for DECstations and SUN 
SPARCstations. Depending on which machine type you 
require, the directories pub/netman/sun4c and dec- 



mips contain the files: etherman.tar.Z, 

interman. tar. Z, and packetman. tar. z. At present, 
source code is not available. 

8 Acknowledgements 

Much of our work has been a distillation of some of 
the publicly available source code provided by the Internet 
community. Many thanks to Jeff Mogul, Steve McCanne, 
Van Jacobsen, Craig Leres, Dave Curry, Robert Braden 
and Annette DeSchon and a multitude of developers across 
the Internet for their contributions of utilities such as 
Tcpdump , nfswatch , NNStat etc. which helped inspire the 
project. Thanks to George Ferguson for his efforts in 
xarchie which guided the development of our GUIs. 
Special thanks go to Phil Dench for his original 
implementation of etherman under IRIX (which will one 
day return), and his willingness to help when gl was all too 
hard. Thanks also to Peter Elford and Geoff Huston for 
encouragement, testing and colour postscript dumps. To all 
the current users across the Internet community, thanks for 
bug reports, suggestions, kind words and your patience. 
Last of all, thanks to all members of staff within the 
department who contributed to this project in many ways. 

References 

[1] Boggs, D.R., Mogul, J.C., Kent, C.A. Measured Capacity of an 
Ethernet: Myths and Reality. In Proceedings of SIGCOMM *88 
(Stanford, CA, Aug. 1988), ACM. 

[2] Braden, R.T. A Pseudo-machine for Packet Monitoring and 
Statistics. In Proceedings of SIGCOMM ’88 (Stanford, CA, Aug. 
1988), ACM. 

[3] Braden, R.T., DeSchon, A.L. NNStat : Internet Statistics Collection 
Package -- Introduction and Users Guide Release 2.3 edition, USC 
/ Information Sciences Institute, Marina del Ray, CA, 1989. 

[4] Case, J., Fedor, M., Schoffstall, M., Davin, J.A. Simple Network 
Management Protocol. Request for Comments #1157, Network 
Information Centre, SRI International, May 1990. 

[5] Curry, D., Mogul, J. nfswatch($l) manual page. Purdue University, 
IL, March 1993. 

[61 Deininger, A., Meyers, N. Using the New Font Capabilities of HP- 
Donated Font server Enhancements. The X Resource, O’Reilly & 
Associates, Inc. CA, Issue 3, 1993 

[7] Dellinger, J. CIA World Bank II map data and source code. 
Stanford University 

[8] Digital Equipment Corporation. The Ethernet, A Local Area 
Network: Data Link Layer and Physical Layer Specifications 
(Version 1.0). Digital Equipment Corporation, Intel, Xerox, 1980. 

[9] Digital Equipment Corporation, packetfilteri 4), Ultrix V4.2 
Manual. 

[101 Gettys, J., Scheifler, R.W., Newman, R. Xlib C Language X 
Interface, X Version 11 Release 5. MIT X Consortium, MA, 
Aug. 1991. 

[Ill Hewlett Packard. HP Openview reference manual. 

[12] Hook, D.G. The hershey( 3) manual page. Melbourne University, 
Melbourne, 1992. 

[13] Hook, D.G. The vog/(4) manual page. Melbourne University, 
Melbourne, 1992 

[14] Hull, C. The EtherView{ 1) manual page. University of Vermont, 
1991. 

[15] Jacobsen, V. The traceroute($ ) manual page. Lawrence Berkeley 


Laboratory, Berkeley, CA, Feb. 1989. 

[16] Jacobsen, V., Leres, C., and McCanne, S. The Tcpdump{\) manual 
page. Lawrence Berkeley Laboratory, Berkeley, CA, June 1989. 

[17] Kiethley, K.S. Three-D Athena Widgets (Xaw3d) source code. Jet 
Propulsion Laboratories, NASA, Feb. 1993. 

[18] McCanne, S. The BPF Manual page. Lawrence Berkeley 
Laboratory, Berkeley, CA, May 1991. 

[19] McCanne, S., Jacobsen, V. The BSD Packet Filter: A New 
Architecture for User-level Packet Capture. In 1993 Winter 
USENIX conference proceedings (San Diego, CA, Jan. 1993). 

[20] McCloghrie, K., Rose, M. Management Information Base for 
Network Management of TCP/IP Based Internets: MLB EL Request 
for Comments #1213, Network Information Centre, SRI 
International, March 1991. 

[21] McCormack, J., Asente, P., Swick, R.R. X Toolkit Intrinsics - C 
Language Interface, X Version 11 Release 5. Massachusetts 
Institute of Technology, MA, August 1991. 

[22] Metcalf, R.M., Boggs, D.R. Ethernet: Distributed Packet Switching 
for Local Computer Networks. Communications of the ACM 
19(7):395-404, July 1976. 

[23] MIT X Consortium. XI1 Windowing System Release 5. 
Massachusetts Institute of Technology, MA, Aug. 1991. 

[24] Mogul, J. C. Efficient Use of Workstations for Passive Monitoring 
of Local Area Networks. In Proceedings of SIGCOMM '90 
(Philadelphia, PA, Sept. 1990), ACM. 

[25] Mogul, J. C., Rashid, R.F., and Accetta, M J. The Packet Filter: An 
Efficient Mechanism for User-level Network Code. In Proceedings 
of 11th Symposium on Operating Systems Principles (Austin, TX, 
Nov. 1987), ACM, pp. 39-51 

[26] Peterson, C.D. Athena Widget Set - C Language Interface, X 
Version 11, Release 5. Massachusetts Institute of Technology, MA, 
July 1991. 

[27] Pochmara, J. xtr source code, Dec. 1989, Oregon Graduate 
Institute, Beaverton, OR 

[28] Postel, J. Internet Protocol, Request for Comments #791, Network 
Information Centre, SRI International, September 1981. 

[29] Silicon Graphics, Inc. NetVisualyzer reference manual, Silicon 
Graphics, Inc. Mountain View’, CA. 

[30] Silicon Graphics, Inc. snoop(6D) manual page. Silicon Graphics, 
Inc. Mountain View, CA, 1990. 

[31] Silicon Graphics, Inc. Graphics Library reference manual, C 
edition (version 4.0). Silicon Graphics, Inc. Mountain View, CA, 
1990. 

[32] Sun Microsystems, Inc. etherd( 8c); SunOS 4.1.1 Reference 
Manual. Mountain View, CA, Oct. 1990. Part Number: 800-5480- 
10 . 

[33] Sun Microsystems, Inc. etherfind{$ c); SunOS 4.1.1 Reference 
Manual. Mountain View’, CA, Oct. 1990. Part Number: 800-5480- 
10 . 

[34] Sun Microsystems, Inc. N/T(4P); SunOS 4.1.1 Reference Manual. 
Mountain View, CA, Oct. 1990. Part Number: 800-5480-10. 

[35] Sun Microsystems, Inc. SunNet Manager 1.1 Installation and 
Users’s Guide. Mountain View, CA, 1991. 

[36] Sun Microsystems Inc. traffic^ lc); SunOS 4.1.1 Reference Manual. 
Mountain View, CA, Oct. 1990. Part Number: 800-5480-10. 

[37] Tremblay, J.P., Manohar, R. Discrete Mathematical Structures 
w ith Applications in Computer Science. McGraw-Hill 1975. 

[38] Trewitt, G. M. Topological Analysis of Local-area Internetworks. 
In Proceedings of SIGCOMM ’88 (Stanford, CA, Aug. 1988), 
ACM. 

[39] W r aldbusser, S. Remote Network Monitoring Management 
Information Base. Request for Comments #1271, Network 
Information Centre, SRI International, Nov. 1991. 

[40] Waldbusser, S. Exposing the Myths about Autotopology. The 
Simple Times, C/o Dover Beach Consulting, Inc., CA, l(l):5-6, 
March 1992. 


How does my code know when it is running? 

Chris McDonald 

chris@budgie.cs.uwa.edu.au 

Department of Computer Science, The University of Western Australia, 

Nedlands, Western Australia, 6009. 

ABSTRACT 

This paper describes a network simulator, xnet, which enables experimentation with various data- 
link layer, network layer, routing and transport layer networking protocols. In addition, different 
application and physical layers may be provided which exhibit varying statistical characteristics of 
message generation and data transmission, such as those of the application protocols telnet and ftp. 
Being an interactive simulator in which the discrete event paradigm is not strictly applicable, a 
different user-level scheduling method, not typically available under Unix, is employed. Networks 
are described in a specification language in which hosts, routers and their interconnections may be 
defined and constrained. Network protocols are themselves written in standard C code - no 
preprocessing or esoteric practices are required to support the user-level scheduling. In particular, 
each node has its own set of variables and executes unique instances of event-handling functions 
when interesting network events occur. Under execution, nodes may be made to reboot, (impolitely) 
crash, (politely) shutdown and reboot, pause and (hardware) fail. Links may be severed or made 
unreliable, while their propagation delays and buffer sizes are changed. 

Successive versions of the simulator have now been used successfully by about 300 students in our 
third year undergraduate course. This paper will discuss the motivation for such a simulation 
environment and its implementation and execution under SunOS and X-windows. 


COMPUTER NETWORKING IN THE COMPUTER SCIENCE CURRICULUM 

As a reflection of both computer industry requirements and contemporary computer science research there is an 
increasing emphasis on computer networks and data communications in the computer science curriculum. The 
direction taken within these courses address selected topics from the following four areas : 

• emphasis on the physical properties of data communications, mechanisms for encoding signals on the 
physical medium, the origins and significance of electrical interference, and the physical construction of 
networks. 

• emphasis on the International Standards Organization's Open Systems Interconnection (ISO/OSI) 
networking reference model. 

• emphasis on shared medium, Local Area Network (LAN) technologies, such as the Ethernet specification 
and token ring networks. 

• emphasis on "open-networking" and internetworking, the Internet Protocol stack, network file systems, 
client-server applications and remote execution paradigms. 

Unfortunately, due to restrictions imposed on courses, such as course length, number of students, required 
course emphasis (for, say, computer science or electronic engineering students), availability of hardware and 
software facilities, and perceived course highlights, not all aspects of computer networking can be addressed 
equally. The University of Western Australia have offered a computer networks course to third year 
undergraduate students for twelve years now with a traditional emphasis on the ISO/OSI reference model and 
more recent emphasis on internetworking. Tanenbaum's "Computer Networks" [TAN88] and Stallings' "Data 
and Computer Communications" [STA91] has been the recommended textbooks. The course still initially 
examines the ISO/OSI model, as, to paraphrase Tanenbaum [TAN88], "While [the ISO/OSI] model may or may 
not be a good way to organize real, live computer networks, it makes an excellent framework for organizing [a 
course] about them." 



TEACHING NETWORKING THROUGH SIMULATION 


The primary motivation in the use and design of a network simulator has been to offer tractable hands-on 
networking exercises providing visual feedback. Networking is an inherently practical area of study and 
students demand something sufficiently challenging. 

The design objectives of the networking environment were: 

• To provide an environment for the design and operation of practical exercises for a large number of 
students - typically one hundred and twenty per semester. While it is appreciated that much smaller class 
sizes would provide more hands on opportunity for each student, computer networking is an emerging 
technology, popular with many students. 

• The large number of students has meant that practical network hardware and wiring exercises, as described 
by Burston et al [BUR87], Hughes [HUG89] and Lions [LI089], were not possible. To quote Lions, "A 
software simulation would be worth a thousand wires." 

• Ironically, even though the students would be working on our Department's network connecting PCs, 
SunSPARC workstations and X-window terminals, it has not been possible to use this network to test 
student-written networking software. The existing network had to remain reliable to perform real work 
and this reliability and network security can not be compromised either accidently or deliberately. 

• On a related point, it is desirable that almost all aspects of the networking simulation could be controlled. 
While practical network exercises, such as those described by Abdel-Wahab [ABD88], do provide 
experience with the software interfaces to live computer networks, most of the control is relinquished to the 
operating system and the existing network software. Real networks are too reliable. The expected error rate 
must be increased by roughly eight orders of magnitude so that students actually do witness errors which 
their code must handle. 

• Ideally the software must enable the examination of many different aspects of computer networking - we 
have been successful in examining checksum algorithms, different data-link protocols, fragmentation 
methods, virtual-circuit and datagram network layers, both static and dynamic routing algorithms, 
connection management and compression algorithms. If two or three of these aspects are examined by 
different student projects in successive years, we reduce the incidence of both student plagiarism and 
laboratory demonstrator boredom. 

While addressing all of these pedagogical requirements, it was also necessary to present a simulator that was 
realistic (read challenging) and not too rigid in its method of use. To successfully design (or demonstrate) 
realistic routing protocols, it is necessary for routing decisions to be required at a number of intermediate nodes 
between source and destination. Typically, at least ten nodes are required to introduce sufficient complexity. At 
the same time we require that protocols be written in the standard (not a contrived variant of the) C 
programming language. Protocol implementations should have access to all standard functions, such as 
dynamic allocation, string handling, mathematical and I/O functions. While it is clear that these more practical 
objectives can be met by implementing each network node as a distinct Unix process linked with the approriate 
libraries, such an approach makes excessive demands on computing resources when simultaneously required 
by upwards of twenty students. 


THE xnet NETWORKING ENVIRONMENT 

The xnet simulator presents a software infrastructure which enables experimentation with various data-link 
layer, network layer, routing and transport layer networking protocols. In addition, different application and 
physical layers may be provided which exhibit varying statistical characteristics of message generation and data 
transmission. For example, routing simulators for different applications such as telnet or ftp may be designed 
which exploit the different message size characteristics of these applications. 

xnet first accepts a number of command line options and then either reads from a network topology file 
(described later) or generates a random network. Each topology file describes a point-to-point network as a (not 
necessarily fully) connected graph. The network is described in terms of its nodes and the connections between 
them. All attributes of both the nodes and their connections initially define and constrain the network but may 
be later modified while the simulation is running. 


The nodes of the network may be either hosts or routers with hosts having the capacity to generate and receive 
messages through their application layer and routers having only the responsibility of delivering messages 
between hosts (and other routers). Node attributes may be both specified in the topology file and modified 
(either globally or on a per-node basis) while the network is running. These attributes include the rate of 
message generation, the minimum and maximum sizes of messages and whether or not to trace all node 
activity. Nodetypes are defined as a class of nodes exhibiting the same characteristics. Instances of different 
nodetypes, for example either client nodes or server nodes, then form a network. 

Nodes are generally reliable, their reboot code is invoked when the simulation commences and they continue 
generating (and hopefully delivering) messages until the simulation ceases. There is, however, facility to 
specify the expected rates of node failure and repair time which, of course, introduce connection management 
demands of network protocols. While executing, nodes may also be paused and (politely) shutdown and 
rebooted to introduce additional problems in end-to-end retransmission protocols. 

The bidirectional links between the nodes are unreliable, being subject to possible frame corruption and loss. 
Link attributes may be specified in the topology file and modified (either globally or on a per-link basis) while 
the network is running. These attributes include the link bandwidth, the propagation delay between endpoints, 
the probability of frame loss and corruption, the transmit buffer size and relative costings (per byte and frame) 
for transmitting frames along that link. Link attributes defined before all nodes in the topology file become the 
default attributes for all following link definitions (unless overridden). Linktypes are defined as a class of links 
exhibiting the same attributes - instances of these linktypes then connect nodes. 


PROTOCOL SOURCE FILES AND SCHEDULING 


Source files containing code providing each of the application layer, physical layer and the "internal" data-link, 
network and routing layer protocols, are specified in either the topology file or with command line arguments. 
The simulator uses programmatic dynamic binding under SunOS to link the user-nominated protocols into the 
address space [SUN89]. The alternative approach of providing the simulator as a library archive with which 
user-nominated protocols should be linked has proved difficult for students to comprehend. Early 
implementations of xnet [McD91] employed the obvious makefile to compile the user protocols and then link 
them with the xnet archive. Students were confused with the need to write, compile and link code without the 
need to write a main () function and handle command line arguments. The current implementation of xnet is 
perceived by students as a testbed of network protocols which may be specified in the C programming 
language. There appears no consensus amongst students as to whether the compilation and linking processes 
are supported silently or whether the protocols written in C are interpreted at runtime. 


gcc -ansi -Wall -c protocol.c 
Id -p -d -o protocoLso protocol.o 


protocol.c - 


- > 

protocol.so 

int i; 
int j = 3; 
static int k; 

void function() 

{ 

static int x; 
int y; 


' ~ ^ 

uninitialized data segment 

i : 



initialized data segment 

j: .word 3 
k: .word 0 
x: .word 0 

'''' 


padding to PAGSIZ 

/* code of function 

} 

V - 


(read-only) text segment 

code of function() 


■*— end 

**— edata 


etext 


read-only 


Fig 1. Building a shared object file from a C source file 








Protocol source files, specified in either the topology file or on the command line, are compiled and linked if 
"out-of-date" with respect to their corresponding shared object file (e.g. protocol.so). As this process is 
performed in either the absence of the shared object file or if the source file has been more recently modified, 
xnet subsumes some of the responsibilities of make(l). Although this approach necessitates the need for xnet to 
process extra switches for the compilation and linking activities (e.g. to support either traditional "K&R" C or 
ANSI-C, or to link with, say, the mathematics library) we argue that a greater understanding of the protocols 
themselves results. Another switch to xnet reveals the entire compilation and linking activities for the pedantic 
(see Fig 1). 

Having access to each source file's shared object image introduces another significant advantage which we have 
not yet seen in other network simulators. It is typical for all network nodes to execute the same protocol 
implementations (with perhaps simplified versions running in router nodes). Each protocol consists of a 
readonly, re-entrant text image which may be shared with all other nodes executing the same protocol. If any 
two nodes require the same protocol text segment, only one copy of this segment is dynamically loaded (into 
the process' heap space). Each protocol also (typically) contains both initialized and uninitialized data segments 
and unique instances of these are of course required for each node (see Fig 2). As each node is scheduled for 
execution using xnet's internal round-robin discrete event scheduler, the data segment of the outgoing node is 
saved and the data segment of the incoming node is copied to the address expected by the text segment. 



Fig 2. xnet's round-robin scheduling of nodes with shared text segments 

In essence, xnet 's internal scheduler mirrors the normal activities of Unix's process scheduler, albeit without any 
hardware support. Nodes (akin to Unix processes) may share readonly text segments with only their data 
segments requiring independence. Unfortunately, there is no hardware support for xnet's internal scheduler, for 
example there is no copy-on-write support for nodes and each outgoing node's data segment is saved even if that 
segment was not modified. In practice, however, a node will always modify at least one of its variables while 
swapped in. As all nodes execute within a single Unix process, nodes share a single heap space. As a 
consequence there is again no hardware protection of the address space and incorrectly written protocols can 
exhibit some strange actions as the result of "pointer stomping". 

xnet does not employ a pre-emptive scheduler, instead event-handling functions must execute to their completion 
and return before the scheduler can dispatch a new event. In particular, xnet must also perform explicit 
dispatching of X-window events. By calling XView's notify_dispatch () routine [HEL91] frequently 
enough we present the illusion that xnet is constantly responding to user input while still performing the 
simulation. This overall approach to scheduling has the unfortunate side-effect of xnet "hanging" if an event- 










































handler is incorrectly written and does not return. The motivation for this scheduling style, despite this 
shortcoming, is that the user-written protocols are to execute within an operating system kernel by responding 
to interrupts and, while the protocols are themselves idle, the operating system will execute other user-level 
code. 

The most feasible solution to the scheduler's shortcoming appears to be to use a pre-emptive variant of a 
lightweight process scheduler to manage each node, and possibly link, as a separate thread of execution. 
Although considered as a longer term goal, the incorporation of lightweight processes have not been 
undertaken for two reasons. An important objective of the simulator was to enable students to write their 
protocols in standard C and not to be concerned about pre-emptive scheduling nor constrained access to 
variables. Most lightweight process libraries only support execution of shared text segments and do not save 
and restore the data segments of individual threads. In particular, statically declared variables are implicitly 
shared between threads and the scheduler cannot present the illusion of having many network nodes executing 
independently within the single Unix process. Any potential problems in integrating X-windows and a 
lightweight process scheduler have been feared but not considered. 


WHAT THE STUDENT SEES 

The network simulation commences by initializing all internal data structures, such as the application and 
physical layer queues and statistics, and then rebooting each node. The attributes of the node, such as its 
address, name, number of links and application message constraints are available to each node in a global data 
structure. Students are instructed to treat this structure as a readonly structure - directly modifying node 
attributes does not affect the simulation other than to confuse the offending node. The structure itself is saved 
and restored as an additional activity of xnet's scheduler. Link attributes are similarly available to each node in 
readonly data structures that are saved and restored by the scheduler. Surprisingly, unless a special command 
line switch is presented to xnet, each node is unaware of even the number of nodes or the adjacency graph of the 
whole network. For example, each node knows how many physical links it has, but not to which nodes it is 
connected. These properties must be deduced via the protocols themselves - typical responsibilities of a 
network layer protocol. 

Each node is rebooted by calling a nominated function within each node. This function, by default 
reboot_node (), is the only function actually required by the simulator - no assumptions are made as to how 
protocols are implemented. The function invoked to reboot each node has the responsibility of initializing static 
variables and structures within the node and possibly allocating dynamic memory whose need is determined 
from the node and link attributes. 

The reboot_node () function is in fact a special case of a general event-handling function that is invoked 
when events of interest occur within a node. Protocols may at any time (typically within reboot node () 
itself) register with the scheduler functions to be invoked when interesting events occur. Events occur when a 
node reboots, when the application layer has a message for delivery, when the physical layer has received a 
frame on one of its links, when a software timer expires, when a node is (politely) shutdown, and when one of 
the five debugging buttons is selected from the node's window. No event is delivered if a node pauses, crashes 
or suffers a hardware failure. These conditions must be deduced by the protocols themselves. Event-handling 
functions are remembered in each node's context as there is no requirement that each node declare the same 
functions (if the events are to be handled at all). For example, a router need not (and in fact cannot) register a 
function to handle the arrival of a message from its application layer (a router has no application layer). 

Students using xnet hence find themselves programming using an inverted control structure. Rather than 
implementing code which simply makes calls to library routines, the student's code is invoked by xnet when 
interesting events need servicing. While this event-driven programming paradigm is not unusual for those 
used to similar environments, such as windowing systems, it is typically the first such exposure for our 
students. Students feel the burning desire to write their own main () function and to either poll the application 
and physical layers or wait ( 2 ) for events as if they were Unix signals. 


THE xnet PROGRAMMING INTERFACE 

Event-handling functions must be initially registered to process incoming events. The only exception to this is 
that the function reboot_node () is assumed to be the function to invoke on node reboots (unless overridden 
with a command line option or in the topology file). Each event-handler is invoked with two parameters - the 



event causing the invocation and a timestamp. The timestamp has significance only for functions handling 
timer events - they do not reflect the current simulation time. 

xnet's application layer (either the internal default version or one provided on the command line) has the 
responsibility of generating messages to be delivered to other application layers. Application layers do not 
generate messages for their own node or for routers. The default application layer prefers to generate messages 
for "close nodes", with a message having twice the chance of being for an immediate neighbour as for a node 
two hops away (and so on). Once informed that the application layer has a message for delivery, it is read 
within an event-handler using the function read_application (). The function provides the address of the 
message's destination node, the message and its length. Each message, itself, carries the values of its own 
length, source and destination, connection and sequence numbers, some fictitious data (from a builtin version of 
BSD's chargen) and is checksummed. The apparent redundancy in each message enables the receiving node to 
ensure that it has been presented with an expected, non-corrupted message. To provide flow control of the 
application layer, message generation in each node may be throttled on either a per-destination or network¬ 
wide basis. 

The physical layer (either the internal default version or one requested from the command line) has the 
responsibility of delivering data frames between nodes. Frames are delivered along links, numbered within 
each node from 1 to its number of connections. Link 0 is provided to copy a frame immediately from a node's 
output to input (a loopback facility). In general, the physical layer will randomly corrupt and drop data frames 
on all links other than 0. Corruption is introduced by simply XORing and interchanging a few bytes in 
unfortunate frames. While this corruption is not representative of "real-world" burst errors, it is sufficient to 
detect with "real-world" checksum algorithms. Functions such as write_physical (), read_physical () 
and writejphysical_reliable () support transmission of user-supplied frames over nominated links. In 
conjunction with xnet's support of multiple software timers providing timeout facilities, these functions are 
used to develop data-link layer protocols. 

Most xnet functions return a traditional indication of their success through their return value. The node-spedfic 
enumerated value xnet_errno reflects each node's most recent error and may be reported using the function 
xnet__perror (). Other sundry functions indude a number of standard 16 bit checksum algorithms (CKC16, 
CCITT-16 [STA91] and the Internet checksum [BBP88]) which are obviously required to detect frame correction. 
Another function, set_time_of_day (), may be used to advance or retard each node's concept of time, xnet's 
scheduler maintains a separate time-of-day clock for each node with each node's time being initially different by 
a few minutes. The set_time_of_day () function has been used to develop a time synchronization protocol 
similar to that described by Mills [MIL92], 

On a network-wide or per-node basis, xnet's execution may be traced in a manner akin to that of traced). 
Tracing appears on xnet's standard error stream and reports each node's event-handling functions being 
invoked (and returned from) and, within each handler, all function calls, their arguments and the function 
return values. Any function arguments modified by the functions (arguments passed by reference) are also 
reported together with any errors detected by xnet. 


EXECUTING xnet UNDER X-WINDOWS 

If xnet detects that it is running under the X-window system it displays the executing network using a rather 
obvious graphical representation. Nodes are presented as icons on a canvas; links presented as line segments 
between the node icons (see Fig 3). As the nodes change their execution state their iconic representation 
changes - a workstation or router icon for an executing node, an hourglass icon for nodes "paused" due to 
excessive user-level activity, a soldering iron and pliers icon for a node undergoing maintenance and the 
familiar tombstone icon for a crashed node. Transmission links carrying different volumes of traffic are 
represented by different width lines. 

The current node and link attributes (as possibly modified via the windowing interface) are available to each 
node in C data structures and variables declared in xnet's standard header file. These are initialized when each 
node is rebooted and updated as each node is scheduled for execution. The contents of these data structures are 
considered as readonly values and cannot be modified directly by the protocols. They may be successfully 
modified via the windowing interface. 


« 


I 

( 

f 


I 





mmm 














?wm$ 


v»sx?*rhm %. * 

Casltper InntJH) 6 $ 

— 

!W«frtim« roi-rtfpdott) l;2AN 3 

i '-. 

(TOirtfeore tessi i t2 an 0 


ilM® 

■ . ; . 

•X\\v;X\;X;X 






_g 

Melbourne 


Fig 3. The typical appearance of xnet showing default node and link panels 

Selecting a node icon with the left mouse button results in sub-window being displayed which contains the 
output and statistics panels of that node, xnet re-implements the most frequently used standard I/O functions 
so that each node's output may be displayed on its own output canvas. In this way, protocols may simply use 
familiar functions such as printf () and putsO with no additional learning curve for students. Output 
explicitly directed to, for example, stderr appears on the default output window (tty or pty). Each node's 
attributes of message rates and sizes may be modified while the network is running by selecting choice buttons. 
Similarly, the default attributes of all nodes in the network may be simultaneously modified by selecting the 
default node panel from the bottom panel and then changing the node attributes thereon. Selecting a node with 
the right mouse button displays a menu from which the node may be forced to reboot, (impolitely) crash, 
(politely) shutdown and reboot, pause and (hardware) fail. 

Selecting a link using the left mouse button results in a sub-window being displayed which contains the 
statistics panel for that link. Links are bidirectional, so "selecting a link" means clicking on the link near the end 
of its source. The link-based attributes of costs and probabilities of error may be modified while the network is 
running by dragging slider panels. Similarly, the attributes of all links in the network may be simultaneously 
modified by first selecting the default link panel from the bottom panel and then changing the link attributes 
thereon (see Fig 4). 

Perhaps surprisingly, xnet was designed as a network simulator specifically for execution under X-windows. 
However, due to the large number of students requiring access to our limited number of X-terminals it has 
become necessary to "retro-design" xnet so that it also executes on character based terminals. As an example, 
each node's output may be directed (mirrored) to individual disk files if so requested with a command line 
argument. Once the gimmick of graphical control over the network wears off, students have been satisfied with 
xnet on character based terminals. 














■ ttS*#** ;. «#vs^oirawMetttff $St>$m*m ./ ' 

|^±^v::^.^^-^---.. ■■:■■■■■■■■■ -.„.- 


^^Mlt8§l 11 ■■•• 

-. fefettssgtt Rate t ■ •. %& 


\*>m &tem. : -& 







8888888 




MESSAGES 

BYTES 


KBytes/sec 


Generated 

727 

396556 


3.40 


Received OK 

706 

398223 


3.42 


i Errors Reev’d 

0 

0 






Fig 4. Executing a larger network with an individual node and link panel selected 


RELATED AND FURTHER WORK 

An early implementation of the network execution environment was used in first semester 1990 by 
approximately eighty students [McD91]. This early implementation consisted of a single central process 
through which a number of instances of the protocol code, each executing as individual Unix processes, 
directed their messages for delivery. The central process acted as a routing control centre to ensure delivery to 
the correct recipient process while possibly corrupting and losing messages. The protocol processes passed 
their messages to the central process using Berkeley sockets and datagrams in the Unix domain in the first 
version and then using kernel-level message passing in the second. Unfortunately, both of these 
implementations placed an excessive load on our Department's resources - a simple network of six nodes being 
developed by twenty students quickly introduced more Unix processes than could be supported. 

The early implementations supported only a single point-to-point domain in which nodes, links between nodes 
and the transmission costs along links could be defined. Only two probabilities, reflecting the likelihood of data 
frame corruption and loss, could be specified. All other attributes, which have since become definable within 
the topology file, were hard-coded within each node, with likelihood of corruption and loss set to zero by 
default. Even with this limited networking environment, a reasonable proportion of a point-to-point network 
could be simulated. 

There are a number of other network simulators available which also present a windowing interface to the 
protocol designer. Of these, the two most significant simulators are the Network Simulation Testbed, NeST 
[BDS88] the Maryland Routing Simulator, MaRS [ASD92]. Both of these simulators are capable of designing 
and experimenting with advanced protocols but neither are considered suitable as undergraduate teaching 
tools. In particular, they impose rather strong restrictions on the way in which protocols may be specified, rely 
on predefined structures for inter-node communication and require protocols to be compiled and linked at the 




























same time as the rest of the simulator. Moreover, because most other protocol simulators are designed to 
experiment with routing and connection protocols, they do not perform the rudimentary physical layer 
functions of actually transmitting (copying) data frames and introducing transmission errors. 

One of the early long term goals of xnet’s design was to permit experimentation with LAN protocols such as the 
Ethernet and token-ring specifications. It has proved difficult to incorporate LAN concepts with the point-to- 
point nature of xnet. We are currently modifying the excellent Ethernet performance simulator, described in 
[BAR93], to accept the existing xnet topology file format and by adding an X-window interface to permit 
modification of attributes while under execution. 

The simulator described in this paper is available to anyone interested in using the software in an educational 
environment. 


REFERENCES 

[ABD88] H.M. Abdel-Wahab, Experience in teaching communications software using Berkeley UNIX, in "ACM 
SIGCSE", Vol 20, No 4, pp32-37, December 1988. 

[ASD92] C. Alaettinoglu, A.U. Shankar, K. Dussa-Zieger and I. Matta, Design and Implementation of MaRS: A 
Routing Testbed, UMIACS-TR-92-103, CS-TR-2964, Institute for Advanced Computer Studies and 
Department of Computer Science, University of Maryland, Aug 1992. 

[BAR93] B.L. Barnett III, An Ethernet Performance Simulator for Undergraduate Networking, Proceedings of the 
24th ACM Computer Science Education Technical Symposium '93, Indianapolis, Indiana, 
ppl 45-150, Feb 1993. 

[BBP88] R.T. Braden, D.A. Borman and C. Partridge, Computing the Internet checksum, RFC-1071, Sep 1988. 

[BDS88] D.F. Bacon, A. Dupuy, J. Schwartz and Y. Yemini, Nest: A Network Simulation and Prototyping Tool, in 
Proceedings of the 1988 Winter USEN1X Conference, Mar 1988. 

[BUR87] A.K. Burston, M.F. Schultz and K.W. Titmuss, TOYNET - A Network for Teaching LAN Principles, The 
University of New South Wales, School of Electrical Engineering and Computer Science, TR-8704, 
Aug 1987. 

[HEL91] D. Heller, XView Programming Manual, 3rd ed., 'The Definitive Guides to the X Window System", Vol 
7, O'Reilly & Associates, Sep 1991. 

[HUG89] L. Hughes, Low Cost Networks and Gateways for teaching Data Communications, in "Proceedings of the 
20th Technical Conference on Computer Science Education (ACM SIGCSE '89)", Vol 21, No 1, 
pp6-ll, Louisville, Kentucky, February 1989. 

[LI089] J. Lions, Computer Networks for Students, The University of New South Wales, School of Electrical 
Engineering and Computer Science, TR-8909, Jul 1989, reprinted in "The [Australian] Computer 
Science Association Newsletter", Vol 3, No 1, pp37-48, Jul 1990. 

[McD91] C.S. McDonald, A Network Specification Language and Execution Environment for Undergraduate 
Teaching, Proceedings of the 22nd ACM Computer Science Education Technical Symposium '91, San 
Antonio, Texas, pp25-34, Mar 1991. 

[MIL92] D.L. Mills, Network Time Protocol (Version 3): Specification, implementation, and analysis, RFC-1305, Mar 
1992. 

[STA91] W. Stallings, "Data and Computer Communications", 3rd ed. Maxwell-Macmillan Publishers, 1991. 

[SUN89] Sun Microsystems Inc. Sun Programming Utilities and Libraries manual. Sun Release 4.1, Sep 1989. 

[TAN88] A.S. Tanenbaum, "Computer Networks", 2nd ed. Prentice-Hall, 1988. 



xnet(local) 


xnet(local) 


NAME 

xnet - a communications network simulation program 
SYNOPSIS 

xnet [many options] {TOPOLOGYJFILE | -r nnodes} 

DESCRIPTION 

xnet is a program which enables experimentation with various data-link layer, network layer, routing and 
transport layer networking protocols. In addition, different application and physical layers may be provided 
which exhibit varying statistical characteristics of message generation and data transmission, xnet first 
accepts a number of command line options and then either reads from a network topology file (described 
later) or generates a random network. Having built and checked the network topology, xnet will compile 
and dynamically link any source files specified on the command line or those files containing the imple¬ 
mentation of the protocols for each network node. Each node will execute its own copy of the network pro¬ 
tocols (as specified in either ANSI-C code or “old” K&R C). In particular, each node has its own set of 
variables and will execute unique instances of event-handling functions when interesting network events 
occur. Each node is initially rebooted by calling its specific reboot__node () function. Thereafter, each 
node is expected to register event handling functions to be invoked by the xnet round-robin scheduler when 
interesting events occur. 

xnet either displays the entire network on a window under OpenLook/X-windows or runs rather less visu¬ 
ally on an ASCII terminal. Under X-windows, nodes may be re-positioned and selected using the left 
mouse button. Selecting a node results in a sub-window being displayed which contains the output and 
statistics panels of that node. The node’s attributes of message rates and sizes may be modified while the 
network is running by selecting choice buttons. Similarly, the default attributes of all nodes in the network 
may be simultaneously modified by selecting the default node panel from the bottom panel and then chang¬ 
ing the node attributes thereon. Selecting a node with the right mouse button displays a menu from which 
the node may be forced to reboot, (impolitely) crash, (politely) shutdown and reboot, pause and (hardware) 
fail. 

Selecting a link using the left mouse button results in a sub-window being displayed which contains the 
statistics panel for that link. Links are bidirectional, so “selecting a link” means clicking on the link near 
the end of its source. The link-based attributes of costs and probabilities of error may be modified while 
the network is running by dragging slider panels. Similarly, the attributes of all links in the network may be 
simultaneously modified by first selecting the default link panel from the bottom panel and then changing 
the link attributes thereon. 

The current node and link attributes (as possibly modified via the windowing interface) are available to 
each node in C data structures and variables declared in xnet.h. These are initialized when each node is 
rebooted and updated as each node is scheduled for execution. The contents of these data structures should 
be considered as readonly values and not be modified directly by the protocols. They may be successfully 
modified via the windowing interface. 

COMMAND-LINE OPTIONS 

xnet supports a number of options which should be presented before the name of the topology file or a 
request for a random topology. 

-A filename.c 

Specify the name of a C file defining a new application layer (to be used as the source and sink of 
all messages). If -A is not provided, a default (internal) application layer is used. 

-d Provide some debugging information when xnet is running. 

-h Set the level of error diagnosis and reporting of calls to the heap management routines: 

mallocO, callocO, realloc (), vallocO, memalign(), cfree(), and 
free (). The routines abort with a message to the standard error if errors are detected in argu¬ 
ments or in the heap. If a bad block is encountered, its address and size are included in the mes¬ 
sage. The use of -h marginally slows xnet but is helpful when debugging allocation errors. 


1 


xnet(local) 


xnet(local) 


-H The same as -h except that the entire heap is examined on every call to the heap management rou¬ 
tines. The use of -H causes xnet to run painfully slowly. 

-k Use “old” Kemighan and Ritchie (K&R) C to compile the user supplied source files for the appli¬ 
cation layer (-A), physical layer (-P) and the network protocol sourcefiles ( -S ). The default is to 
use the ANSI-C compiler, gcc. 


-Llibdir 

-llibname 

Include the indicated library directory name or library file name, such as -Im for the mathematics 
library, in any necessary compilation and linking. Either of these options may be specified a num¬ 
ber of times if necessary. 


-m mins 

To prevent runaway processes (and students) choking our teaching machines, a 3 minute time limit 
is silently imposed on xnet execution. This limit may be overridden with the -m option. 

-N Provide the number of nodes in the network in the C variable NNODES. Surprisingly, the default 
is that each node will not know how many nodes the network contains (NNODES = 0). 

-o file_prefix 

Copy the output of each node’s print f () , puts () and put char () to a file with the indi¬ 
cated prefix. For example, the use of -o output will typically create and write to the files 
output.nodeO, output.nodel , output.node2 . 

-p After building and checking the network topology, simply print it out and exit xnet. 

-P filenames 

Specify the name of a C file defining a new physical layer (to be used to deliver and possibly cor¬ 
rupt and lose data frames). If -P is not provided, a default (internal) physical layer is used. 

-r nnodes 

Request that a random network be generated, consisting of nnodes. Each node will have at least 
one link but the whole network is not (yet) guaranteed to be connected. The -r option may be 
used instead of providing a topology file. 

-R function_name 

Use f unction_name () as the function to first invoke when rebooting each node. 

-s Print a summary of the network’s activity before xnet exits. 

-S filenames 

Specify the name of a C sourcefile to be used to implement the network protocols of each network 
node. If -S is not provided, the filename “protocol.c” is assumed. 

-t Trace all events delivered to each network node. A description of all function calls, arguments 
and return and xnet_errno values is reported via standard error. 

-T By default, xnet runs in “wall-clock” time, that is, the simulation performs one second of net¬ 
work-work in one second of “wall-clock” time. This works well for up to about twenty nodes 
beyond which xnet “gets behind”. Using -T forces xnet to ignore the “wall-clock” time and 
update its own clock in TIME_SCALE (10ms) increments. 

-X Disable support for X-windows (the default for ASCII terminals!). 


2 



xnet(local) 


xnet(local) 


NETWORK TOPOLOGIES 

Each point-to-point network consists of a number of nodes in a (not necessarily fully) connected graph. 
The nodes may be either hosts or routers . Host nodes have an application layer which is responsible for the 
generation and receipt of messages to and from other hosts. Routers are otherwise identical to hosts but 
have no application layer. Nodes have a number of attributes that may be specified in the topology file and, 
in the case of hosts, the application layer message sizes and delivery rates may be modified under X- 
windows. Using a combination of node attributes (specified in the topology file) and mouse and menu 
selections under X-windows, nodes may be forced to reboot, (impolitely) crash, (politely) shutdown, pause 
and (hardware) fail. 

The links between nodes are unreliable, being subject to noise bursts and possible corruption and loss of 
data frames. The rates of reliability together with the propagation delay along links may be controlled with 
link attributes in the topology file or under X-windows. Links are connected to nodes via transmission 
buffers whose maximum size may similarly be controlled. Links may also have costs (weights) ascribed to 
them for each frame and byte transmitted. The links themselves never fail (“go down”) permanently. 

All information about the network is described in a network topology file , an example of which follows. 

probframecorrupt = 4, probframeloss = 3, 
minmessagesize = lk, maxmessagesize = 4k 
sourcefile = "protocol.c" , 


host perth { 

x=100, y=200, 

link to melbourne { costperframe 

} 

host melbourne { 
x=270, y=200 

link to Sydney { costperframe 

link to hobart { costperframe 

} 

host brisbane { 
x=430, y=100 

link to Sydney { costperframe 

} 

host hobart { 

x=4 00, y=300 

link to Sydney { costperframe 

link to melbourne { costperframe 

} 

host Sydney { 

x=470, y=200 

link to hobart { costperframe 

} 


8 } 


1, propagationdelay = 1 }, 
6 } 


3 } 


5 }, 
3 } 


4 } 


By default, every node in the network only knows its own node and link characteristics and the number of 
direct links it has to its neighbours. Unless the -N option is provided (or a constant hard-coded in C), each 
node is even unaware of how many nodes exist in the network. All other information must be derived by 
each node using its network protocols. 

NETWORK ATTRIBUTES 

A number of node and link attributes may be specified in the topology file to define and constrain the net¬ 
work. Node or link attributes that are defined in the topology file before all node specifications become the 
default, or global, attributes for all following nodes and links. Different attributes for individual nodes and 
links may thereafter be defined locally to override the default attributes. Under X-windows node and link 
attributes that may be modified while the network is running are presented on choice buttons and slider 
panels. Choice buttons whose description is “Default” and slider panels whose value is -1 indicate that 
the value of that attribute is taken from those of the default node or default link. The default node and link 
attributes may be seen and modified by selecting their panels with the bottom panel buttons. 


3 


xnet(local) 


xnet(local) 


Attributes which define probabilities are expressed as one chance in a power of two of the event occurring. 
For example, a probframecorrupt of 4 indicates that there is one in sixteen (uniform distribution) chance of 
a data frame being corrupted on a link. Attributes which define times or rates represent multiples of xnet's 
TIME_SCALE (typically 10ms). For example, a propagationdelay expressed as 100 indicates a delay of 
one second. Alternatively, a propagationdelay expressed as 5 0 0ms indicates a delay of half a second. 


Node attributes: 

address 

outputfile 


reboomode 

sourcefile 

messagerate 

minmessagesize 

maxmessagesize 

mtbf 

repairtime 

trace 


- integer coordinates of the node’s position under X-windows. 

- the integer network address of each node. 

- the name of the output file for each node. When used as a global attribute, outputfile 
is used as a filename prefix (as with the -o option). When used locally, outputfile is 
used as is. 

- the name of the ANSI-C function to call when each node reboots (locally overrides 
the -R option). 

- the name of the ANSI-C file containing the protocols for each node (locally over¬ 
rides the -S option). 

- the rate at which the application layer has a one-in-two chance of having a message 
to deliver. 

- the minimum size (in bytes) of messages generated by the application layer. 

- the maximum size (in bytes) of messages generated by the application layer. 

- the expected rate at which the node will fail due to a hardware failure. 

- the expected time taken to repair a hardware failre. 

- a boolean value indicating if event tracing is required (locally overrides the -/ 
option). 


Link attributes: 

propagationdelay 

probframecorrupt 

probframeloss 

costperbyte 

costperframe 


- the propagation delay along a link. 

- the probability that a frame on this link will be corrupted. 

- the probability that a frame on this link will be lost altogether. 

- the cost (in cents) per byte along this link. 

- the cost (in cents) per frame along this link. 


Note that all links are bidirectional - there is a link Sydney —» Melbourne although this is not explicitly 
declared. Unless otherwise indicated, traffic costs are the same in both directions - the cost of 
Sydney —> Melbourne is lc per frame, but the Sydney —> Hobart costs are different in each direction. Note 
also that the total cost of sending a frame from Melbourne to Hobart is cheaper if directed via Sydney. 


NETWORK LIBRARY FUNCTIONS 

xnet employs an event-driven programming style to indicate to each node that events associated with the 
network need attention. Events occur when a node reboots, when the application layer has a message for 
delivery, when the physical layer has received a frame on one of its links, when a software timer expires, 
when one of the five debugging buttons is selected from the node’s window and when a node is being 
(politely) shutdown. No event is delivered if a node pauses, crashes or suffers a hardware failure. 

Event-handling functions must be registered to process incoming events. The only exception to this is that 
the function reboot__node () is assumed to be the function to invoke on node reboots (unless overridden 
with either the -/? option or the rebootjiode attribute). Each handler is invoked with two parameters - the 
event causing the invocation and a timestamp. The timestamp has significance for functions handling timer 
events; all other handlers will simply receive the special NULLTIME STAMP. Notice that timestamps do 
not reflect the current time; they specify w'hich timer has expired. If no events are pending for a node, any 
handler registered for the special event EV_NULL is invoked. 

The following network library functions, presented here as ANSI-C prototypes, are supported by xnet. 
Their use, calling conventions and parameters are similar, though not identical, to those used in Tanen- 
baum’s “Computer Networks”, 2nd edition. 


4 



xnet(local) 


xnet(local) 


Application layer functions: 

The application layer (either the internal default version or one provided with the -A option) has the 
responsibility of generating messages to be delivered to other application layers. An application layer will 
not generate a message for its own node. Notice that the required destination node is indicated by network 
address and not node number. Each node’s address and node number will in fact be the same, unless the 
address attribute is specified in the topology file. The default application layer prefers to generate mes¬ 
sages for “close nodes”, with a message having twice the chance of being for an immediate neighbour as 
for a node two hops away (and so on). 

int read__application (int *destaddr, char *msg, int *len) ; 

On invocation, len must point to an integer indicating the maximum number of bytes that may be 
copied into msg. On return, len will point to an integer now indicating the number of bytes 
copied into msg. The network address of the required destination node is copied into destaddr. 

int write_application(char *msg, int *len); 

Passes a number of bytes, pointed to by msg, “up to” the application layer. On invocation, len 
must point to an integer indicating the number of bytes to be taken from msg. On return, len will 
point to an integer now indicating the number of bytes accepted by the application layer. 

void enable_application(int destaddr); 

Permits the application layer to generate messages for the node with the indicated network 
address. Initially, message generation for all destination nodes is disabled and must be enabled to 
begin the generation of messages. 

void enable_all_applications(void); 

Permits messages to be generated for all destination nodes. 

void disable_application(int destaddr); 

Prevents the application layer from generating new messages for the node with the indicated net¬ 
work address. This function should be called when a harried node runs out of buffer space, or per¬ 
haps while routing information is being gathered. 

void disable_all_applications(void); 

Disables the generation of all messages for all destination nodes. 


Physical layer functions: 

The physical layer (either the internal default version or one provided with the -P option) has the responsi¬ 
bility of delivering data frames between nodes. Frames are delivered along links, numbered within each 
node from 1 to the number of links. Link 0 is provided to copy a frame immediately from a node’s output 
to input. In general, the physical layer will randomly corrupt and drop data frames on all links other than 0. 

int write_physical(int link, char *frame, int *len); 

Passes a number of bytes, pointed to by frame “down to” the physical layer which will attempt 
to deliver them on the indicated link (wire). Each node has a fixed number of links, the first 
available link is number 1, the second is number 2, and so on. As a special case, a node may reli¬ 
ably transmit a frame to itself by requesting the LOOPBACK link. On invocation, len must point 
to an integer indicating the number of bytes to be taken from frame. On return, len will point 
to an integer now indicating the number of bytes accepted by the physical layer. 

int write_physical_reliable(int link, char *frame, int *len); 

Identical to writejphysical () though the transmission is guaranteed to be error free (provid¬ 
ing a reliable data-link layer). 

int write__direct (int destaddr, char *msg, int *len) ; 

Similar to write_physical_reliable () but the network address of the required destina¬ 
tion node may be requested (providing a reliable network/routing layer for asynchronous message 


5 


xnet(local) 


xnet(local) 


passing). Messages transmitted using write_direct () are considered transmitted on and 
arrive on link number 1. The special destination address BROADCAST may be used to transmit a 
message to all nodes except the sender. 

int read_j?hysical (int *link, char *frame, int *len); 

Accepts the specified maximum number of bytes from the physical layer, placing them in the 
address pointed to by frame. On invocation, len must point to an integer indicating the maxi¬ 
mum number of bytes that may be copied into frame. On return, len will point to an integer 
now indicating the number of bytes taken from the physical layer and link will point to an inte¬ 
ger indicating on which link they were received. 


Timer functions: 

A total of 10 software timer queues are supported to provide a call-back mechanism for user code. For 
example, when a data frame is transmitted a timer is typically created which will expire sometime after that 
frame’s acknowledgement is expected. Timers are referenced via unique integers termed timestamps. 
When a timer expires, the event handler for the corresponding event is invoked with the event and unique 
timestamp as parameters. Timers may be cancelled to prevent them expiring (for example, if the acknowl¬ 
edgement frame arrives before the timer expires). Timers are automatically cancelled as a result of their 
handler being invoked. 

TimeStamp add_timer(NetEvent ev, long msec); 

Requests that a new timer be created which will expire in the indicated number of milliseconds. 
One of the ten timer queues may be requested by passing one of the event types 
EV_TIMER1. . EV_TIMER1 0. A unique timestamp is returned to distinguish this newly created 
timer from all others. This timestamp should later be used in subsequent calls to can- 
cel_timer (). If a new timer cannot be created, the special timestamp NULLTIMESTAMP will 
be returned. 

int cancel__timer (NetEvent ev, TimeStamp ts) ; 

Requests that the indicated timer be cancelled (before it has expired). 

Other functions: 

int event_handler(NetEvent ev, void (*func)()); 

Register func as the void “returning” function to be invoked when the event ev occurs. 

int set_debug_string(NetEvent ev, char *str); 

Change the string on the indicated X-windows debug button (for EV_DEBUG1. . EV_DEBUG5) to 
str. If str is the null or the empty string, the indicated button is removed. 

int set__time__of_day (long newsec, long newms) ; 

Change the node’s notion of the wall-clock time of day. As nodeinfo should be considered a read¬ 
only structure, this is the only method to set time_of_day. sec and time_of_day .ms . 
(time_of_day. sec is the number of seconds since Jan. 1, 1970, and may be used in a call to 
ctime(3c)). 

int move__cursor (int row, int col); 

void clear__to_eoln (void) , clear__to__eos (void) ; 

Under X-windows, functions to move the cursor of each node’s output window to the indicated 
row and column (home is (0,0)), clear to the end of the screen and to the end of the current line. 

int checksum_internet (unsigned short *addr, int nbytes); 
unsigned short checksurn_ccitt (unsigned char * addr, int nbytes); 


6 



xnet(local) 


xnet(local) 


unsigned short checksum_crcl6(unsigned char * addr, int nbytes); 

Functions which take a memory address and perform the specified checksum calculation on the 
indicated number of bytes starting from that address. 

Function restrictions: 

Note that calls to the functions write__application () , read_application () , 
enable__application(), enable_all_applications(), disable_application(), 
disable_all_applications () and event_handler (EV__APPLICATIONLAYER, . . .) are 
invalid if called from a router node. 

OUTPUT AND ERROR HANDLING 

All output requested with printf () , puts () and putchar () will appear in the output window of 
the appropriate node. Any output explicitly directed to C’s stdout or stderr will appear on the invok¬ 
ing tty (or xterm window), unless redirected by the shell. If either the -o option or the outputfile attribute 
are provided, each node’s output will also be copied to the indicated file. 

Most library functions return the integer 0 on success and the integer -1 on failure. The most recent error 
status is reflected in the global variable xnet_errno. All values of xnet_errno will be instances of 
the enumerated type NetError (defined in xnet.h). Errors may be reported to standard error with 
xnet_perror () and their error message string accessed via *xnet_errstr [ ]. 

X-WINDOW DEFAULTS 

As well as the standard X-windows and OpenLook default command line arguments, the following defaults 
may be specified in your 7 .Xdefaults file to define and constrain the execution of xnet. Each resource is 
presented with its minimum, default and maximum values in parenthesis. 

xnet.x, xnet.y (0,100,800) 

The starting coordinates of xnet's baseframe. 

xnet.debug (False) 

Similar to, though may be overridden by, the ~d option. 

xnet .mins (1,3, (1«30) ) 

Similar to, though may be overridden by, the -m mins option. 

xnet.random (2,unused,100) 

Similar to, though may be overridden by, the -r nnodes option. 

xnet.report (5,20,1000) 

How frequently (in seconds) to report the scheduler’s activity iff -d is specified. 

xnet.trace (False) 

Similar to, though may be overridden by, the -t option. 

xnet.showstats (False) 

Indicate if xnet's statistics window should be displayed initially. 

xnet.statsx, xnet.statsy (0,100,800) 

The starting coordinates of xnet's statistics window. 

EXIT STATUS 

xnet will exit with a 0 if either the “Kill xnet’’ button is selected or the time limit expires. Any other condi¬ 
tions, such as an error in the topology file or a syntax error in the protocol source files, will result in xnet 
exiting with a status of 1. 

FILES 

Some example material appears in Ihome/kultarrJyear3/examples/networks. 

The standard header file which is included by xnet itself, and should be included in your protocol files, is in 
lusrfkultarr/libfxnet.h 


7 


xnet(local) 


xnet(local) 


SEE ALSO 

[McD91] "A Network Specification Language and Execution Environment for Undergraduate Teaching ", 
C.S. McDonald, Proceedings of the ACM Computer Science Education Technical 
Symposium ’91, San Antonio, Texas, Mar 1991, pp25-34. 

LIMITATIONS 

xnet refuses to compile files with the word receive spelt incorrectly. 

Sun-Patch-ID#100257-03 (4-Oct-91) must be installed under SunOS 4.1.1 (and earlier) for xnet to work 
correctly there. This patch is not needed for 4.1.2 and beyond. 

Only one application layer type and one physical layer type may be specified, with -A and -P respectively. 
It is not possible for each node to use a different application layer or physical layer. 

As xnet uses the X-windows environment under UNIX, the protocols must not use the following UNIX sys¬ 
tem and library functions: 

alarm(3), getitimer(3), ioctl(2), setitimer(2), sigblock(2), sigmask(2), signal(3), sigvec(2), system(3), 
wait(2), wait3(2) or perform any UNIX I/O that may block. 

AUTHOR 

Chris McDonald (The University of Western Australia, chris@cs.uwa.edu.au) 

Thanks to John Hine (hine@comp.vuw.ac.nz), Duncan McEwan (duncan@comp.vuw.ac.nz) and Chris 
Pudney (chrisp@cs.uwa.edu.au). 


8 






A History of Unix 

Greg Rose 

Australian Computing and Communications Institute 


ABSTRACT 


(aka A History of UNIX, A History of the UNIX(tm) operating system, UNIX is a trademark of Bell 
Laboratories, I mean AT&T, that is AT&T Bell Laboratories, errr UNIX System Laboratories, in the 
United States and other countries, at least some of them.) 

Unix was conceived nearly a quarter of a century ago, and has followed an amazingly consistent 
exponential growth path ever since. This means that the average length of time an individual has been 
involved with Unix is constant, and about two years. Thus, the average person knows little of the actual 
history of Unix. Much of the UNIX operating system, whichever version you use, has its roots either in 
the evolutionary process leading to the current version, or in the political environment prevailing while 
that evolution was happening. By exploring the history of this development, a number of otherwise 
obscure things about Unix become much more clear, for example, why the version numbers of the two 
major variants are asymptotically approaching the next integer. 

The presentation will give one long distance view of this developmental process. The speaker has been 
involved with Unix since 1975, and has had to live through many of these developments. 








I 





XIU 



Unix Goes to School 



PDP-11, Interdata 8/32 



J* 

k. 

o 

Hi 

CO 

t 

03 

4-« 

C/) 

X 

z 


in 

oo 

O) 

T- 

I 

o 

CO 

O) 



■ ■ 


X 

c 



"D 

Q) 


CO 

o 

E 

CD 

73 

03 

O 

< 


CO 

CD 

C 

JC 

O 

03 

E 

"O 

CD 

N 

‘co 

03 

C 

CD 

E 

H—' 

k. 

03 

CL 

CD 

T3 

C 

o 

o 

-Q 

i5 

’c5 

> 

< 


O 

CD 

03 

03 

=3 

03 

c 

03 


CD 

> 

CD 


0) 

sz 

03 

Lc 

03 

c 

73 

CD 

■f-* 

C 

CD 

E 

o 

CL 

E 


Q. 

03 

CD 

SZ 

o 

73 

C 

•iS 

0) 

J3 

’c5 

> 

03 

CD 

73 

O 

O 

CD 

O 

k. 

13 

O 

c r > 



6 
JD 
L—* 
03 
CL 

E 

o 

o 

CD 

JQ 

O 


CD 

> 

03 

JC 


rv 

c 

o 

*U 


CD 


O 

Q. 

CL 

3 

CO 

O 


I I I I 



ACM Turing Award for Ritchie and Thompson (1984) 



Promotion to a Standard 
1986-present 


a> 

3 

4— 

3 

LL 

<D 

I- 


0 

U) 

0 

0 

4-4 

■ mm 

O) 

c 

O 

0 

0 

■ ■MB 

c 

• mmm 

4 - 

0 

TJ 

0 

X 

2 

3 


o 

0 

9b. 

0 

c 

mmmm 

0 

E 

0 

0 

0 

si 

4 — 

0 

> 

0 


c r,) 

E 

0 

4 — 

0 

>» 

0 

CJ) 

C 

■ MM 

H—» 
0 
9b. 

0 

Q. 

O 

0 

9-_ 

3 

3 

LL 


0 

9b. 

0 

E 

0 

H 


w 
0 
o 

0 
4— 

lb. 

0 
4—' 

c 

MB 

C 

o 

E 

E 

o 

o _ 

o a 

^ o r 

3 

O 


0 
o 
£ 

0 

0 
JQ 

4—> 

0 

3 

E 

0 
o 
0 

Q 4 — 1 Vb> 

•— 4— 


0 

C 

0 

-Q 

I 


£ 

0 

E 

Q. 

O 

0 

> 

0 

■o 

0 


*o 

0 

0 

£ 

I 


£ 

0 

E 

0 

o 

£ 

0 

£ 

0 

4— 

o 

>4 


T3 

0 

_0 

0 

9— 

9_ 

o 

o 

0 
■ mmm 

X 


0 

lb. 

0 

H—» 
H— 

O 

0 

TJ 

£ £ 

O 0 

4-4 0 

0 


U> 

£ 

0 
0 
0 

i- CO 

« C 
.E o 

*tr C /a _ 


3 

CL 

E 

o 

o 


0 

0 

0 

£ 

0 

> 

■ mmm 

0 

4- 

4— 

0 


I 


0 

£ 

■o 

i_ 

0 

JC 


o — 


0 
4—> 

£ 

0 

E 

CL 

O 

0 

> 

0 

TJ 


W £ 
O 0 
O 2 


I 


g0 

0.4— 
0 0 
o>c 

• mmm 

E 0 

**- 3 

o o 

4— 0 

4-4 0 

E o) 
a o 
ra 0 

0 4-4 

J= ® 

4-4 .C 

I 


• ■ 
0 
CL 
3 
O 


O 


>* 


0 

3 

■u 

£ 


TJ 

£ 

0 

£ 

O 

■ mmm 

4— 

0 

0 

■ MM 

■D 

9— 

0 

■o 

£ 

0 

4-4 

CO 



0 

0 

4-4 

4-4 

mmmm 

E 

E 

o 

o 


X O 
CO CO 

o z 

CL < 

I I 


E 

3 


£ 

0 

CL 

O 

I 

X 


o 

LJLJ 

CL 

CO 


O 

0 

C 

o 


O 

X 


£ 

O 


0 

T3 

£ 

3 

O 


0 

1— 

0 

4-4 

4— 

o 

CO 


0 

£ 

o 



£ 

0 

CL 


O 


X 

2 

D 


I I I I I 


a. 

co 

O 

0 


o 

CM 

0 

0 

X 

0 

h- 


I 


I 


88-Open, SPARC International, RiscWare, 



What Version Was That? 



Xenix 


PWB 


SunOS 


OSF/1 

























































J 

f 

r 

r 

( 

< 

( 

( 

/ 

( 

< 

1 

(. 

{ 

l 




SYDNEY MELBOURNE 

(02) 879 9500 (03) B82 8211 


one w&rkst/iUoK f or the 
couple o\ the Indigo 

,.. wmld mdkmfotedkj foe mr dmke 


TRADE IN SUN OR APOLLO FOR AN INDIGO 

The best value-for-money workstation, as measured by Unix Review, just 
became even cheaper for owners of Sun and Apollo workstations. 

Until May 31st, 1993, Silicon Graphics will accept any Sun or Apollo 
workstation as part payment for an IRIS Indigo on a trade-in deal unmatched by 
any other vendor in the marketplace. 

The Indigo bundle includes disk, 
memory and any of Indigo’s extensive 
graphics accelerators, as well as: 

v I B| Explorer: a visual application 

^ ^ I -1 developer 

r 1 1 Workspace: a visual work environment 

*■ I Showcase: a multimedia presentation 

I application 

Media Mosaic: a set of tools for 

,i .i ^ capturing, editing and presenting 
** video, audio, images and graphics 

......mu ^ NFS: Network File System 

To take advantage of this extraordinary once-only offer call your local 
Silicon Graphics office today and trade up to the world’s most 
productive personal workstation. 0 <rt 




SHiCOnGraphiCS BRISBANE 

Computer Systems m 257 im 


ADELAIDE 
(08) 379 7333 


PERTH 

(09) 321 4595 







